Regular Expression Simplification Issue - regex

I'm trying to understand the equivalence between regular expressions α and β defined below, but I'm losing my mind over conflicting information.
a+b: a or b
ab: concatenation of a and b
$: empty string
α = (1*+0)+(1*+0)(0+1)*($+0+1)
β = (1*+0)(0+1)*($+0+1)
https://ivanzuzak.info/noam/webapps/regex_simplifier/ says, that α is equivalent to β.
My school however teaches that concatenation has stronger binding than union, meaning that:
11*+0 =/= 1(1*+0)
Which would mean that my α looks like this with parentheses:
α = (1*+0) + ( (1*+0)(0+1)*($+0+1) )
and that
α =/= ( (1*+0) + (1*+0) ) (0+1)*($+0+1)
I hope it's clear what my problem is, I'd appreciate any kind of help. Thanks.

Usually, two regular expressions are considered equivalent when they match the same set of words.
How they match it is not relevant. Therefore it doesn't matter which of the operators has greater precedence.
Note the subtle difference between being equal (in written form) and being equivalent (having the same effect).

Alright, it turns out that I have misunderstood why b+b <=> b.
It's that L1∪L2 <=> L2, if L1 is subset of L2.

Related

Substituting an operation in Z3 expression

I have a Z3 formula of the form (> Expr1 Expr2), and I would like to change it by (< Expr1 Expr2), while preserving the structure of Expr1 and Expr2. As I understand, substitute is helpful to replace variables by others; but I am not sure on which arguments I should give to change the operator, if that is even possible. Is it possible with substitute or by another method?
I am using the OCaml bindings.
Thanks in advance for your answers :)
You can just create a new expression, e.g.,
let ult_of_ugt ctxt exp = match Expr.get_args exp with
| [x; y] -> BitVector.mk_ult ctxt x y
| _ -> invalid_arg "expected two operands"

Match warning and pattern-matching in SML

I was wondering what would be a good strategy to understand if pattern-matching in SML will proceed the Match warning.
Consider the following function:
fun f 7 (x,y) = x * 5.1 | f x (y,#"a") = y;
From first glance, it looks like it does not provide the Match warning. But if I'll run it, it will.
From my point of view, we handle all of the cases. which case we don't handle? even if f 7 (x,#"a") we know which case should be (first one).
My question is, how to decide that the function will output that waning.
Also, I would be glad for an answer why the following function is invalid:
fun f (x::xs) (y::ys) (z::zs) = y::xs::ys::zs;
without zs its valid. how does zs change it?
My question is, how to decide that the function will output that waning.
The compiler has an algorithm that decides this.
Either use the compiler and have it warn you, or use a similar heuristic in your head.
See Warnings for pattern matching by Luc Maranget (2007).
It covers the problem, algorithm and implementation of finding missing and duplicate patterns.
A useful heuristic: Line patterns up, e.g. like:
fun fact 0 = 1
| fact n = n * fact (n - 1)
and ask yourself: Is there any combination of values that is not addressed by exactly one case of the function? Each function case should address some specific, logical category of the input. Since your example isn't a practical example, this approach cannot be used, since there are no logical categories over the input.
And fact is a bit simple, since it's very easy to decide if it belongs to the categories 0 or n.
And yet, is the value ~1 correctly placed in one of these categories?
Here is a practical example of a function with problematic patterns:
fun hammingDistance [] [] = SOME 0
| hammingDistance (x::xs) (y::ys) =
if length xs <> length ys then NONE else
if x = y
then hammingDistance xs ys
else Option.map (fn d => d + 1) (hammingDistance xs ys)
It may seem that there are two logical cases: Either the lists are empty, or they're not:
The input lists are empty, in which case the first body is activated.
The input lists are not empty, in which case they have different or equal length.
If they have different lengths, NONE.
If they have equal lengths, compute the distance.
There's a subtle bug, of course, because the first list can be empty while the second one isn't, and the second list can be empty while the first one isn't. And if this is the case, the second body is never hit, and the distinction between different / equal lengths is never made. Because the task of categorizing is split between pattern matching and if-then-else with precedence to pattern matching.
What I do personally to catch problems like these preemptively is to think like this:
When I'm pattern matching on a list (just for example), I have to cover two constructors (1. [], 2. ::), and when I'm pattern matching on two lists, I have to cover the Cartesian product of its constructors (1. [], [], 2. [], ::, 3. ::, [], and 4. ::, ::).
I can count only two patterns/bodies, and none of them aim to cover more than one of my four cases, so I know that I'm missing some.
If there had been a case with variables, I have to ask how many of my common cases it covers, e.g.
fun hammingDistance (x::xs) (y::ys) =
if x = y
then hammingDistance xs ys
else Option.map (fn d => d + 1) (hammingDistance xs ys)
| hammingDistance [] [] = SOME 0
| hammingDistance _xs _ys = NONE
Here there's only three patterns/bodies, but the last one is a catch-all; _xs and _ys match all possible lists, empty or non-empty, except if they're matched by one of the previous patterns first. So this third case accounts for both of 2. [], :: and 3. ::, [].
So I can't simply count each pattern/body once. Some may account for more than one class of input if they contain very general patterns via pattern variables. And some may account for less of the total input space if they contain overly specific patterns via multiple constructors. E.g.
fun pairs (x::y::rest) = (x, y) :: pairs rest
| pairs [] = []
Here x::y::rest is so specific that I'm not covering the case of exactly one element.

REGEX L(r) = {a^n b^m : n + m is even}, r =?

So I did a problem earlier that said:
L(r) = {w in {a,b}* : w contains at least 2 a's}
For that one I said {a^2n , b} because that guarantees a string like aab or aabaab etc. Not sure how to approach the one I posted about in the title. Possibly a solution might be a^2n, b^2m so its always even, but also 2 odd numbers like a^n b^3m is also always even. Am i allowed to set boundaries like n>=m?
Thank you!
You correctly observe that n and m must either be both even or both odd. It only needs to be added that an odd number is one more than an even number.
A simple regular expression for "an even number of as" ( {a2n : n ≥ 0}) is (aa)*, while "an odd number of as" is (aa)*a.
Building on that, we can two cases for the original question: (aa)*(bb)* and (aa)*a(bb)*b, which can be combined into (aa)*(ab&plus;ε)(bb)*. (Assuming you are using + for alternation and ε for the empty string.)
r=((a+b)^2)* ,i think this regular expression is also giving the right answer

Simplify terms within an expression

Sympy can simplify this:
In [26]: (asinh(sinh(x))).simplify()
Out[26]: x
but doesn't simplify that:
In [28]: (asinh(sinh(x))+1).simplify()
Out[28]: asinh(sinh(x)) + 1
How can I ask for subparts of an expression to be simplified? If possible, I'd like to avoid equation-scale simplification, e.g. that a common denominator is found and factorised out for all terms.
Arguably, this simplification should not happen at all because it's not always true: for example, asinh(sinh(2*I)) is not 2*I. The current implementation of simplify has a clause for "canceling" a function and its inverse, which only applies if the entire expression is that function, and does not pay attention to things like asin(sin(pi)) being 0 rather than pi. Inverses are tricky.
But the following approach, based on replace, will replace all known "function-inverse" pairs:
expr = sqrt(asinh(sinh(x))) + sin(asin(exp(2*x+1)))
expr = expr.replace(lambda f: isinstance(f, Function) and isinstance(f.args[0], f.inverse(argindex=1)), lambda f: f.args[0].args[0])
# expr is now sqrt(x) + exp(2*x + 1)
The first argument of replace is a filter function,
lambda f: isinstance(f, Function) and isinstance(f.args[0], f.inverse(argindex=1))
which asserts that f is a Function whose first argument is its inverse.
The second argument of replace is the action to be done on matching subexpression,
lambda f: f.args[0].args[0]
means: replace by the argument of the argument, i.e., asinh(sinh(x)) -> x.
As noted above, it's not guaranteed that the result of such "simplification" is mathematically equivalent to the original expression.

Check if `LIKE` patterns intersects in Postgres

There ara two strings in some request that are patterns that used within LIKE expressions (with _ and % placeholders). I want to find if this patterns intersects (have some string that matches them both). Is there any way to do that?...
“Like pattern” corresponds to finit or infinit set of strings. Each string in this set matches to given pattern. I want to check if intersection of string sets for two given patterns is not empty. Thus it is better to say patterns conjunction. In a math language:
S — set of strings
P — set of patterns (where each pattern has one or more string representation)
Sᵢ — subset of strings (Sᵢ ⊂ S) that match pᵢ pattern (where instead of i could be any index).
In equation form: “Sᵢ = {s | s ∈ S, s matches pᵢ, pᵢ ∈ P}” — that meas: “Sᵢ is a set of elements that are strings and match pᵢ pattern”.
Or another notation: “Sᵢ ⊂ S, ∀pᵢ ∈ P ∀s ∈ S (s matches pᵢ ≡ s ∈ Sᵢ)” — that meas: “Sᵢ is subset of strings and any string is element of Sᵢ if it matches pᵢ pattern”.
Let's define conjunction of patterns: “p₁ ∧ p₂ = p₃ ≡ S₁ ∩ S₂ = S₃” — that means: “Set of strings that match conjunction of patterns p₁ and p₂ is intersection of sets of strings that match p₁ pattern and that match p₂ pattern”.
For example:
ab_d and %cd — intersects
k%n and kl___ — intersects
I want to find if this patterns intersects (have some string that matches them both). Is there any way to do that?... (...) I want to check if intersection of string sets for two given patterns is not empty.
So, if I get this right, given two like patterns, p1 and p2, you're interested in whether there exists a (yet to be determined) string that matches p1 as well as p2.
E.g.:
select check_pattern('a%', 'b_'); -- false
select check_pattern('a%', '_b'); -- true ('ab')
Are you even sure there's a general solution to that problem in the first place?
Assuming there is, plain SQL isn't the right tool to find the solution imho, because you cannot readily express this in terms of "here's my (finite) set of data, join/filter them and yield a set based on it". To find the solution in SQL terms, you'd need to generate the set that stems from your data, and that's obviously not an option when the set in question is infinite.
Methinks you'd want to break up the problem into smaller parts and use a procedure language such as C, Perl, Lisp, whatever you fancy.
One potential solution might be this:
If both p1 and p2 are open on both ends or different ends, the answer is trivially yes: strings matching %foo% will intersect with those matching %bar%, just as strings matching foo% will intersect strings matching %bar.
If p1 yields a finite set (i.e. it contains no %), you could imagine iterating the entire set of potential matches for p1 using generate_series() or a for/while/whatever loop, and trying p2 on each string. It's ugly and inefficient, but it'll eventually work.
If p1 and p2 are both anchored (e.g. abc% and def% or %abc and %def), or reasonably anchored (e.g. _abc% and abcd%) the solution is trivial enough as well by considering the anchored part and proceeding as in the prior case.
I'll leave it to you to enumerate and solve the remaining cases if any...
The key, I think, will be to nail down the anchored parts of your patterns that yield a finite set of strings, and to stick to checking whether the (finite) set of strings they will match will intersect.