Regular Language Closure Unconcatenation - regex

I'm trying to find an operation that can take a regular language and "unconcatenate" it with another. For example:
a*L - a* = L | where L is a regular language
I know that difference (subtraction) isn't the operation I want. But I believe I'm getting my point across.
Another way to look at it is if there have a set L that is logically equal to (A ∪ B), but we do not have access to A. So if we can only use L, B, and derivations of such, can we somehow derive A. Basically:
L - B = A | L = (A ∪ B)
I have put plenty of thought into this problem, using many variations of compliment, intersection, and other closure properties of regular languages, but I simply can't figure it out.
The best I've managed to come up with is:
A = ((L - B) ∪ (A ∩ B) | L = (A ∪ B)
However this requires A on the right side.

If L = A U B, define an operator - such that L - B = A.
The problem with this is that the operator - is not well-defined: Given L and B, there are potentially several languages which satisfy L = A U B. In particular, if A is a subset of L and any (possibly improper) superset of L \ B, then A is a solution; that is, if A = (L \ B) U C, where C is a (possibly improper) subset of B, then L - B might as well be equal to that set.
Now, you could define - to mean the set of all such A, and in that case, you could make this workable using set difference, union and power set operators. Then, L - B = Q where Q = {(L \ B) U {}, (L \ B) U {B[0]}, ..., (L \ B) U B = L}.
You can make this well-defined if you specify - always returns the "smallest" element of Q (for finite sets, the one with the fewest elements; for infinite sets, the one which is a subset of all other sets) in which case you recover simply L \ B.
If L = B.A, define an operator - such that L - B = A.
A similar problem exists here: there may be several languages which, when appended to B, give L. For example, consider B = a*, and two choices for A: a* and {e}, the language containing only the empty set. You can show without much effort that a* a* = a* e, so L is the same either way, B is the same, and L - B must now produce two different values: either a* or {e}.

Related

Simplify expression with assumptions involving relations between variables in SymPy

Is it possible to simplify an expression in SymPy, if we know that variables satisfy certain equation?
For example in Mathematica we can write something like this:
Simplify[a+b-c, a+b==c]
Of course in this case it is possible to solve for a and make a substitution. However, for the long expressions making a global substitution might not make a sense. If the goal is to produce the shortest expression possible, one might need to apply substitution for the certain terms and leave the rest untouched or solve for b instead of a.
I think sympy.assumptions module cannot impose restrictions mutually on the several variables.
Is it possible to achieve the functionality of Mathematica's Simplify[expr, assum] in any other way in SymPy?
Or is there any other open-source project which can do something like this?
SymPy's current assumptions system can not handle relationships between variables although that is being worked on. There are a couple of ways that you can do this though.
The ratsimpmodprime function simplifies an expression that is polynomial in some symbols based on knowing that the symbols themselves satisfy polynomial equations. We can use this to make a function that simplifies the example you showed:
In [26]: a, b, c = symbols('a:c')
In [27]: polysimp = lambda expr, eqs: ratsimpmodprime(expr, groebner(eqs).exprs)
In [28]: polysimp(a + b - c, [a + b - c])
Out[28]: 0
In [29]: polysimp(a + b, [a + b - c])
Out[29]: c
In [31]: polysimp(a**4 + b - c, [a**2 - b, b - c])
Out[31]:
2
c
You can also introduce a new symbol and solve for that along with the other equations as a combined system:
In [33]: solve([z - (a + b - c), a + b - c])[z]
Out[33]: 0
This method has the advantage that you can choose which symbols you want to eliminate e.g.:
In [38]: solve([z - (a + b), a + b - c], [z, c])[z]
Out[38]: a + b
In [39]: solve([z - (a + b), a + b - c], [z, b])[z]
Out[39]: c
Either answer is valid since a + b == c so the expected output from "simplifying" is ambiguous.

Haskell function to work on integer tuple

I'm have just started learning Haskell and am trying to create a function that performs several checks on a tuple containing 6 integers.
These checks include:
all digits are different;
alternate digits are even and odd, or odd and even;
alternate digits differ by more than two;
the first and middle pairs of digits form numbers that are both multiples of the last
The problem is that I can attempt this and have some working functions like
contains e [] = False
contains e (x:xs)
| x == e = True
| otherwise = contains e xs
unique :: [Int] -> Bool
unique [] = True
unique (x:xs)
| contains x xs = False
| otherwise = unique xs
for the first requirement, but as you can see this relies on using a list rather than a tuple.
I would appreciate it if someone could help me with how to create these functions for tuples instead, as well as any code efficiency suggestions.
You can convert a 6-tuple to a list, with:
tuple6ToList :: (a, a, a, a, a, a) -> [a]
tuple6ToList (a, b, c, d, e, f) = [a, b, c, d, e, f]
and then run the checks on the list for example. This is likely simpler, since one can then recurse on the list, whereas for a tuple it would mean that you "unwind" the checks into individual checks on the elements.

Interleaving in OCaml

I am trying to create a function which interleaves a pair of triples such as ((6, 3, 2), ( 4, 5 ,1)) and create a 6-tuple out of this interleaving.
I made some research but could understand how interleaving is supposed to work so I tried something on my own end ended up with a code that is creating a 6-tuple but not in the right interleaved way. This is my code
let interleave ((a, b, c), (a', b', c')) =
let sort2 (a, b) = if a > b then (a, b) else (b, a) in
let sort3 (a, b, c) =
let (a, b) = sort2 (a, b) in
let (b, c) = sort2 (b, c) in
let (a, b) = sort2 (a, b) in
(a, b, c) in
let touch ((x), (y)) =
let (x) = sort3 (x) in
let (y) = sort3 (y) in
((x),(y)) in
let ((a, b, c), (a', b', c')) = touch ((a, b, c), (a', b', c')) in
(a, b', a', b, c, c');;
Can someone please explain to me how with what functions I can achieve a proper form of interleaving. I haven't learned about recursions and lists in case you would ask why I am trying to do it this way.
Thank you already.
The problem statement uses the word "max" without defining it. If you use the built-in compare function of OCaml as your definition, it uses lexicographic order. So you want the largest value (of the 6 values) in the first position in the 6-tuple, the second largest value next, and so on.
This should be pretty easy given your previously established skill with the sorting of tuples.
For what it's worth, there doesn't seem to be much value in preserving the identities of the two 3-tuples. Once inside the outermost function you can just work with the 6 values as a 6-tuple. Or so it would seem to me.
Update
From your example (should probably have given it at the beginning :-) it's pretty clear what you're being asked to do. You want to end up with a sequence in which the elements of the original tuples are in their original order, but they can be interleaved arbitrarily. This is often called a "shuffle" (or a merge). You have to find the shuffle that has the maximum value lexicographically.
If you reason this out, it amounts to taking whichever value is largest from the front of the two tuples and putting it next in the output.
This is much easier to do with lists.
Now that I understand what your end-goal is . . .
Since tuples of n elements are different types for different n's, you need to define helper functions for manipulating different sizes of tuples.
One approach, that basically mimics a recursive function over lists (but requires many extra functions because of tuples all having different types), is to have two sets of helper functions:
functions that prepend a value to an existing tuple: prepend_to_2, up through prepend_to_5. For example,
let prepend_to_3 (a, (b, c, d)) = (a, b, c, d)
functions that interleave two tuples of each possible size up to 3: interleave_1_1, interleave_1_2, interleave_1_3, interleave_2_2, interleave_2_3, and interleave_3_3. (Note that we don't need e.g. interleave_2_1, because we can just call interleave_1_2 with the arguments in the opposite order.) For example,
let interleave_2_2 ((a, b), (a', b')) =
if a > a'
then prepend_to_3 (a, interleave_1_2 (b, (a', b')))
else prepend_to_3 (a', interleave_1_2 (b', (a, b)))
(Do you see how that works?)
Then interleave is just interleave_3_3.
With lists and recursion this would be much simpler, since a single function can operate on lists of any length, so you don't need multiple different copies of the same logic.

What are the type of Strings generated by (a*+b*)

Besides strings of any number of a's and b's like aa.. or bb.. ,Would regular expression (a*+b*) contain a string like
ab
or any string ending with b ?
Is (a*+b*) same as (a* b*) ?
I am a little bit confuse about the strings generated by regular expression (a*+b*) and would really appreciate if someone can help.
Unless you're working with a regex language which explicitly classifies the *+ as a special token which either has a special meaning, or is reserved for future extension (and produces defined behavior now, or a syntax error), the natural parse of a*+ is that it means (a*)+: the postfix + is applied to the expression a*.
If that interpretation applies, next we can observe that (a*)+ is equivalent to just a*. Therefore a*+b* is the same as a*b*.
Firstly, by definition R+ means RR*. Match one R and then zero or more of them. Therefore, we can rewrite (a*)+ as (a*)(a*)*.
Secondly, * is idempotent, so (a*)* is is just (a*). If we match "zero or more a", zero or more times, nothing changes; the net effect is zero or more a. Proof: R* denotes this infinite expansion: (|R|RR|RRR|RRRR|RRRRR|...): match nothing, or match one R, or match two R's, ... Therefore, (a*)* dentes this expansion: (|a*|a*a*|a*a*a*|...). These inner a*-s in turn denote individual second-level expansions: (|(|a|aa|aaa|...|)|(|a|aa|aaa|...)(a|a|aaa|...))|...). By the associative property of the branch |, we can flatten a structure like (a|(b|c)) into (a|b|c), and when we do this to the expansion, we note that there are numerous identical terms—the empty regex (), the single a, the double aa and so on. These all reduce to a single copy, because (|||) is equivalent to () and (a|a|a|a|...) is equivalent to just (a) and so on. That is to say, when we sort the terms by increasing length, and squash multiple identical terms to just one copy, we end up with (|a|aa|aaa|aaaa|...), which is recognizable as the expansion of just a*. Thus (a*)* is a*.
Lastly, (a*)(a*) just means a*. Proof: Similarly to the previous, we expand into branches: (|a|aa|aaa|...)(|a|aa|aaa|...). Next we note that the catenation of branch expressions is equivalent to a Cartesian Product set of the terms. That is to say (a|b|c|..)(i|j|k|...) means, precisely: (ai|aj|ik|...|bi|bj|bk|...|ci|cj|ck|...|...). When we apply this product to (|a|aa|aaa|...)(|a|aa|aaa|...) we get a proliferation of terms, which, when arranged in increasing length and subject to de-duplication, reduce to (|a|aa|aaa|aaaa|...), and that is just a*.
I think it helps here to look at a formal definition of regular expressions, i.e. to look for each regular expression e which language L(e) does it produce.
So let's start simple:
(1)
What about the regexp a (only the letter)? Its language is
L(a) := {a},
just the single word/character "a".
(2)
For the regexp e1 + e2, where e1 and e2 are regexps themselves,
L(e1 + e2) := L(e1) U L(e2).
So e.g. if a and b are characters, L(a+b) = {a, b}.
(3)
For the regexp e1 e2 (concatenation), where e1 and e2 are regexps themselves,
L(e1 e2) := all words w such that
we can write w = w_1w_2 with w_1 in L(e1) and w_2 in L(e2)".
(4)
What about a regular expression *e**, where e might be a regular expression itself? Intuitively, a word is in L(e*) if it has the form
w_1 w_2w_3w_4...w_n, with w_i in L(e) for each i.
So
L(e*) := all words w such that we can write
w = w_1 w_2 .. w_n
for a n >= 0 with all w_i in L(e) (for i = 1, 2, ..., n)
So, what about L((a* + b*))?
L((a* + b*))
(according to rule 2)
= L(a*) U L(b*)
(according to rule 4/1)
= {eps, a, aa, aaa, aaaa, ....} U {eps, b, bb, bbb, bbbb}
= all strings that have either only a's OR only b's in it
(including eps, the so-called empty word)
Similarly for (a* b*):
L((a* b*))
(according to rule 3)
= all words w = w_1 w_2 with w_1 in L(a*) and w_2 in L(b*)
= {eps eps, eps b, a eps, ab, aa eps, aab, ...}
= {eps, b, a, ab, aa, aab, aabb, ... }
= all strings that first have zero or more a's, then zero or more b's.
For the beginning I think it helps to "deconstruct" the regular expression, as we did above - since regular expressions can also be seen as trees, just like the more known arithmetic expressions, for example:
+
/ \
* *
| |
a b

Evaluating set expressions

I have a universe of elements organized into n non-disjoint sets. I have m expressions built using these sets, using union/intersection/difference operators. So given an element, I need to evaluate these m expressions, to find out which of the "derived" sets contain the element. I do not want to compute the "derived" set because it will be very time and space inefficient. Is there a way to say whether an element will lie in one of the derived sets just by looking at its expression? For e.g. if the expression is C = A U B and the element lies in set A, then i can say that it will lie in set C. Are there any C libraries to perform computations of this nature?
if im not mistake,
let e = the element
replace each set A, B with true if e is in the set, false if its not. Then, convert the set operators to their logical equivalents, and evaluate the expression as boolean. It should all map well to boolean operators, even xor and stuff.
for example, if e is in both A B, but not D
C = (A U B) xor D
it would be in C because
C = (true or true) xor false
-> (true) xor false
-> true
That could be pretty fast if you can quickly find if an element is in a set