Formalising regular expressions with a complement operation - regex

I'm playing with a formalisation of a certified regular expression matcher in Idris (I believe that the same problem holds in any type theory based proof assistant, such as Agda and Coq) and I'm stuck on how to define semantics of the complement operation. I have the following data type to represent semantics of regular expressions:
data InRegExp : List Char -> RegExp -> Type where
InEps : InRegExp [] Eps
InChr : InRegExp [ a ] (Chr a)
InCat : InRegExp xs l ->
InRegExp ys r ->
zs = xs ++ ys ->
InRegExp zs (Cat l r)
InAltL : InRegExp xs l ->
InRegExp xs (Alt l r)
InAltR : InRegExp xs r ->
InRegExp xs (Alt l r)
InStar : InRegExp xs (Alt Eps (Cat e (Star e))) ->
InRegExp xs (Star e)
InComp : Not (InRegExp xs e) -> InRegExp xs (Comp e)
My problem is to represent the type of InComp constructor since it has a non-strictly positive occurrence of InRegExp due to the usage of Not. Since such data types can be used to define non-terminating functions, they are rejected by terminations checker. I would like to define such semantics in a way that it is accepted by Idris termination checker.
Is there some way that could I represent semantics of complement operation without have negative occurrences of InRegExp?

You can define InRegex by recursion on Regex. In that case, strict positivity is no issue, but we have to recurse structurally:
import Data.List.Quantifiers
data Regex : Type where
Chr : Char -> Regex
Eps : Regex
Cat : Regex -> Regex -> Regex
Alt : Regex -> Regex -> Regex
Star : Regex -> Regex
Comp : Regex -> Regex
InRegex : List Char -> Regex -> Type
InRegex xs (Chr x) = xs = x :: []
InRegex xs Eps = xs = []
InRegex xs (Cat r1 r2) = (ys ** (zs ** (xs = ys ++ zs, InRegex ys r1, InRegex zs r2)))
InRegex xs (Alt r1 r2) = Either (InRegex xs r1) (InRegex xs r2)
InRegex xs (Star r) = (yss ** (All (\ys => InRegex ys r) yss, xs = concat yss))
InRegex xs (Comp r) = Not (InRegex xs r)
We would need an inductive type for the Star case if we wanted to use our old definition. The following obviously doesn't work:
InRegex xs (Star r) = InRegex (Alt Eps (Cat r (Star r)))
However, we can just simply change the definition and make finiteness explicit:
InRegex xs (Star r) = (yss ** (All (\ys => InRegex ys r) yss, xs = concat yss))
This has the intended meaning and I don't see any drawbacks to it.

You could mutually define NotInRegExp which would explain what it means to not be recognised by a regexp (I haven't checked whether this is valid syntax).
data NotInRegExp : List Char -> RegExp -> Type where
NotInEps : Not (xs = []) -> NotInRegExp xs Eps
NotInChr : Not (xs = [ a ]) -> NotInRegExp xs (Chr a)
NotInCat : (forall xs ys, zs = xs ++ ys ->
NotInRegExp xs l
+ InRegExp xs l * NotInRegExp ys r) ->
NotInRegExp zs (Cat l r)
etc...
You should then be able to define a nice decision procedure:
check : (xs : List Char) (e : RegExp) -> Either (InRegexp xs e) (NotInRegexp xs e)

You could also define this type by recursion on the RegExp plus some inductive datatype for the semantics of Star.
I guess it wouldn't interact as nicely with the built-in pattern matching but it would have the same induction principle.

Related

Understanding function which implements foldr and foldl

There is some case where I don't understand how foldr and foldl are used in function.
Here is a couple of example, I then explain why I don't understand them:
-- Two implementation of filter and map
map' f = foldr (\x acc -> (f x):acc) []
map'' f xs = foldl (\acc x -> acc ++ [(f x)]) [] xs
filter' f xs = foldr(\x acc -> if(f x) then x:acc else acc) [] xs
filter'' f = foldl(\acc x -> if(f x) then acc++[x] else acc) []
Why does map'' makes the use of xs but non map'? Shouldn't map' need a list for the list comprehension formula as well?
Same case for filter' vs filter''.
Here is an implementation which insert elements in a sorted sequence:
insert e [] = [e]
insert e (x:xs)
| e > x = x: insert e xs
| otherwise = e:x:xs
sortInsertion xs = foldr insert [] xs
sortInsertion'' xs = foldl (flip insert) [] xs
Why are the argument for insert flipped in sortInsertion ([] xs) (empty list and list) compare to the definition of insert(e []) (element and empty list)
Why does map'' makes the use of xs but non map'? Shouldn't map' need a list for the list comprehension formula as well? Same case for filter' vs filter''.
This is called “eta-reduction” and it’s a common way of omitting redundant parameter names (“point-free style”). Essentially whenever you have a function whose body is just an application of a function to its argument, you can reduce away the argument:
add :: Int -> Int -> Int
add x y = x + y
-- “To add x and y, call (+) on x and y.”
add :: (Int) -> (Int) -> (Int)
add x y = ((+) x) y
-- “To add x, call (+) on x.”
add :: (Int) -> (Int -> Int)
add x = (+) x
-- “To add, call (+).”
add :: (Int -> Int -> Int)
add = (+)
More precisely, if you have f x = g x where x does not appear in g, then you can write f = g.
A common mistake is then wondering why f x = g . h x can’t be written as f = g . h. It doesn’t fit the pattern because the (.) operator is the top-level expression in the body of f: it’s actually f x = (.) g (h x). You can write this as f x = (((.) g) . h) x and then reduce it to f = (.) g . h or f = fmap g . h using the Functor instance for ->, but this isn’t considered very readable.
Why are the argument for insert flipped in sortInsertion
The functional parameters of foldr and foldl have different argument order:
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
Or, with more verbose type variable names:
foldr
:: (Foldable container)
=> (element -> accumulator -> accumulator)
-> accumulator -> container element -> accumulator
foldl
:: (Foldable container)
=> (accumulator -> element -> accumulator)
-> accumulator -> container element -> accumulator
This is just a mnemonic for the direction that the fold associates:
foldr f z [a, b, c, d]
==
f a (f b (f c (f d z))) -- accumulator on the right (second argument)
foldl f z [a, b, c, d]
==
f (f (f (f z a) b) c) d -- accumulator on the left (first argument)
That is partial function application.
map' f = foldr (\x acc -> (f x):acc) []
is just the same as
map' f xs = foldr (\x acc -> (f x):acc) [] xs
if you omit xs on both sides.
However, beside this explanation, I think you need a beginner book for Haskell. Consider LYAH.

Inference rules for subsequence order

I am doing some exercises with the subsequence order,
record _⊑₀_ {X : Set} (xs ys : List X) : Set where
field
indices : Fin (length xs) → Fin (length ys)
embed : ∀ {a b : Fin (length xs)} → a < b → indices a < indices b
eq : ∀ {i : Fin (length xs)} → xs ‼ i ≡ ys ‼ (indices i)
where
_‼_ : ∀ {X : Set} → (xs : List X) → Fin (length xs) → X
[] ‼ ()
(x ∷ xs) ‼ fzero = x
(x ∷ xs) ‼ fsuc i = xs ‼ i
is the usual safe lookup.
Now while the record version is nice, I'd like to use inference rules instead as that's probably easier than constructing embeddings and proving properties about them each time.
So I try the following,
infix 3 _⊑₁_
data _⊑₁_ {X : Set} : (xs ys : List X) → Set where
nil : ∀ {ys} → [] ⊑₁ ys
embed : ∀ {x y} → x ≡ y → x ∷ [] ⊑₁ y ∷ []
cons : ∀ {xs₁ ys₁ xs₂ ys₂} → xs₁ ⊑₁ ys₁ → xs₂ ⊑₁ ys₂ → xs₁ ++ xs₂ ⊑₁ ys₁ ++ ys₂
Which looks promising. Though I have had trouble proving it to be a sound and complete reflection of the record version.
Anyhow, the subsequence order is transitive, and this is a bit of trouble:
⊑₁-trans : ∀ {X : Set} (xs ys zs : List X) → xs ⊑₁ ys → ys ⊑₁ zs → xs ⊑₁ zs
⊑₁-trans .[] ys zs nil q = nil
⊑₁-trans ._ ._ [] (embed x₁) q = {! q is absurd !}
⊑₁-trans ._ ._ (x ∷ zs) (embed x₂) q = {!!}
⊑₁-trans ._ ._ zs (cons p p₁) q = {!!}
We get unification errors when pattern matching on a seemingly impossible pattern q. So I have tried other data versions of the order that avoid this unification error but then other proofs have seemingly-absurd patterns lying around.
I'd like some help with a data version of the subsequence order (with soundness and completeness proofs, that'd be nice).
Are there any general heuristics to try when transforming a proposition in formula form into an inference/data form?
Thank-you!
We get unification errors when pattern matching on a seemingly
impossible pattern q.
That's the usual "green slime" problem. In the words of Conor McBride:
The presence of ‘green slime’ — defined functions in the return types
of constructors — is a danger sign.
See here for some techniques to beat the green slime.
For _⊑_ use order preserving embeddings:
infix 3 _⊑_
data _⊑_ {α} {A : Set α} : List A -> List A -> Set α where
stop : [] ⊑ []
skip : ∀ {xs ys y} -> xs ⊑ ys -> xs ⊑ y ∷ ys
keep : ∀ {xs ys x} -> xs ⊑ ys -> x ∷ xs ⊑ x ∷ ys
⊑-trans : ∀ {α} {A : Set α} {xs ys zs : List A} -> xs ⊑ ys -> ys ⊑ zs -> xs ⊑ zs
⊑-trans p stop = p
⊑-trans p (skip q) = skip (⊑-trans p q)
⊑-trans (skip p) (keep q) = skip (⊑-trans p q)
⊑-trans (keep p) (keep q) = keep (⊑-trans p q)

Haskell right-sided recursion to left-sided recursion

In my problem i have list of lists, and i want to find list of lists being selectors (selector - list containing exactly one element from each list), satisfying special condition.
The code to generate all selectors would look like:
selectors :: [[a]] -> [[a]]
selectors [] = [[]]
selectors (y:ys) = [ (x:xs) | x <- y, xs <- selectors ys]
If i wanted to add some extra condition this would be like
selectors :: [[a]] -> ([a] -> Bool) -> [[a]]
selectors [] _ = [[]]
selectors (y:ys) f = [ (x:xs) | x <- y, xs <- selectors ys f, f xs]
However in my problem, i need the condition to be dependable on element being candidate for a list, and what's in list i currently build. So this would be something like:
selectors :: [[a]] -> ( a-> [a] -> Bool) -> [[a]]
selectors [] _ = [[]]
selectors (y:ys) f = [ (x:xs) | x <- y, xs <- selectors ys f, f x xs]
And this is working very slow, because at first the recursion goes in very deeply and real work starts from there, whereas this would be MUCH faster if building list was going from left, so whenever i try to add new element to my list and i know this cannot be added so i'd just try to add new element. How can i make this work this way?
You can change the order of some searches by commuting the loop bodies.
for i in foo foo j in bar
for j in bar versus foo i in foo
do(i, j) do(i, j)
The same effect can be achieved in list comprehension syntax. For the given example, it might be
[ (x:xs) | x <- y, xs <- selectors ys f, f x xs ]
-- versus
[ (x:xs) | xs <- selectors ys f, x <- y, f x xs ]
If we're only considering the result as a set of values (i.e. the order is immaterial) then the values are identical. Regarded as a set, the only rules considering order of list comprehension clauses are that referenced variables must be bound in clauses left of their reference site.
Let's desugar this notation a bit to see the mechanics at work in higher fidelity.
List comprehensions are (almost) equivalent to do-notation in the list monad. Without necessarily diving into what monads are, I'll claim that our list comprehension desugars like this
-- [ (x:xs) | x <- y, xs <- selectors ys f, f x xs ]
-- becomes...
do x <- y
xs <- selectors ys f
guard (f x xs)
return (x:xs)
The translation should be obvious—each generator clause containing (<-) becomes a do-syntax binding form. Each guard clause becomes a do-notation form using the (perfectly normal) function guard :: Bool -> [()]. Finally, the translation preserves order.
But now, do-notation is just syntax sugar itself! It desugars to a series of function applications. Again, to not dive into the meaning of monads, I'll just do this transformation exactly.
-- [ (x:xs) | x <- y, xs <- selectors ys f, f x xs ]
-- becomes...
y >>= (\x -> selectors ys f >>= (\xs -> guard (f x xs) >> return (x:xs)))
In particular, each generator line like x <- E becomes E >>= (\x -> ...) where the ... corresponds to the translation of the remainder of the do block. Lines like E without binding arrows translate to E >> .... We can even simplify this one level further by noting that E >> F is nothing more than E >>= (\_ -> F) so that ultimately we have
y >>= (\x -> selectors ys f >>= (\xs -> guard (f x xs) >>= (\_ -> return (x:xs))))
And as a final step, we can translate the (>>=), guard, and return functions to the format they take for the list monad. In particular ls >>= f is equal to concat (map f ls) and return x = [x]. It's actually convenient to write (>>=) in a prefix instead of infix form, as well, so we'll call it forl :: [a] -> (a -> [b]) -> [b].
The function guard is a little strange. It looks like guard b = if b then [()] else []. We'll see how it works in a moment.
forl y $ \x ->
forl (selectors ys f) $ \xs ->
forl (guard (f x xs)) $ \_ ->
[x:xs]
Now this is a full translation. If we can understand this then we've understood the mechanics of the list comprehension. For comparison, this is how the list comprehension desugars when we switch the order of the generator clauses
forl y $ \x -> forl (selectors ys f) $ \xs ->
forl (selectors ys f) $ \xs -> forl y $ \x ->
forl (guard (f x xs)) $ \_ -> forl (guard (f x xs)) $ \_ ->
[x:xs] [x:xs]
which looks very similar to the imperative example given at the beginning. Let's show that it's actually identical.
First, we can dispatch how forl (guard (f x xs)) $ \_ -> [x:xs] works. We'll just inline the definition of guard and then forl
forl (if (f x xs) then [()] else []) (\_ -> [x:xs])
concat (map (\_ -> [x:xs]) (if (f x xs) then [()] else []))
We can "lift" the if out of the inside by noting that once we've wrapped the whole thing in an outer lift, the value of (f x xs) is fixed in both the then and else branches.
if (f x xs)
then concat (map (\_ -> [x:xs]) [()]
else concat (map (\_ -> [x:xs]) []
And finally, we can inline the maps and then the concats
if f x xs
then concat [(\_ -> [x:xs]) ()]
then concat []
if f x xs then [x:xs] else []
forl y $ \x -> forl (selectors ys f) $ \xs ->
forl (selectors ys f) $ \xs -> forl y $ \x ->
if f x xs then [x:xs] else [] if f x xs then [x:xs] else []
And now it ought to be increasingly clear how these "for" loops work. They loop over a body and produce a list of the results. Since we expect that the body will also be a forl loop, we have to anticipate that the value in the body is a list itself—this we flatten that extra layer of lists using concat.

Implement insert in haskell with foldr

How to implement insert using foldr in haskell.
I tried:
insert'' :: Ord a => a -> [a] -> [a]
insert'' e xs = foldr (\x -> \y -> if x<y then x:y else y:x) [e] xs
No dice.
I have to insert element e in list so that it goes before first element that is larger or equal to it.
Example:
insert'' 2.5 [1,2,3] => [1.0,2.0,2.5,3.0]
insert'' 2.5 [3,2,1] => [2.5,3.0,2.0,1.0]
insert'' 2 [1,2,1] => [1,2,2,1]
In last example first 2 is inserted one.
EDIT:
Thanks #Lee.
I have this now:
insert'' :: Ord a => a -> [a] -> [a]
insert'' e xs = insert2 e (reverse xs)
insert2 e = reverse . snd . foldr (\i (done, l) -> if (done == False) && (vj e i) then (True, e:i:l) else (done, i:l)) (False, [])
where vj e i = e<=i
But for this is not working:
insert'' 2 [1,3,2,3,3] => [1,3,2,2,3,3]
insert'' 2 [1,3,3,4] => [1,3,2,3,4]
insert'' 2 [4,3,2,1] => [4,2,3,2,1]
SOLUTION:
insert'' :: Ord a => a -> [a] -> [a]
insert'' x xs = foldr pom poc xs False
where
pom y f je
| je || x > y = y : f je
| otherwise = x : y : f True
poc True = []
poc _ = [x]
Thanks #Pedro Rodrigues (It just nedded to change x>=y to x>y.)
(How to mark this as answered?)
You need paramorphism for that:
para :: (a -> [a] -> r -> r) -> r -> [a] -> r
foldr :: (a -> r -> r) -> r -> [a] -> r
para c n (x : xs) = c x xs (para c n xs)
foldr c n (x : xs) = c x (foldr c n xs)
para _ n [] = n
foldr _ n [] = n
with it,
insert v xs = para (\x xs r -> if v <= x then (v:x:xs) else (x:r)) [v] xs
We can imitate paramorphisms with foldr over init . tails, as can be seen here: Need to partition a list into lists based on breaks in ascending order of elements (Haskell).
Thus the solution is
import Data.List (tails)
insert v xs = foldr g [v] (init $ tails xs)
where
g xs#(x:_) r | v <= x = v : xs
| otherwise = x : r
Another way to encode paramorphisms is by a chain of functions, as seen in the answer by Pedro Rodrigues, to arrange for the left-to-right information flow while passing a second copy of the input list itself as an argument (replicating the effect of tails):
insert v xs = foldr g (\ _ -> [v]) xs xs
where
g x r xs | v > x = x : r (tail xs) -- xs =#= (x:_)
| otherwise = v : xs
-- visual aid to how this works, for a list [a,b,c,d]:
-- g a (g b (g c (g d (\ _ -> [v])))) [a,b,c,d]
Unlike the version in his answer, this does not copy the rest of the list structure after the insertion point (which is possible because of paramorphism's "eating the cake and having it too").
Here's my take at it:
insert :: Ord a => a -> [a] -> [a]
insert x xs = foldr aux initial xs False
where
aux y f done
| done || x > y = y : f done
| otherwise = x : y : f True
initial True = []
initial _ = [x]
However IMHO using foldr is not the best fit for this problem, and for me the following solution is easier to understand:
insert :: Int -> [Int] -> [Int]
insert x [] = [x]
insert x z#(y : ys)
| x <= y = x : z
| otherwise = y : insert x ys
I suppose fold isn't handy here. It always processes all elements of list, but you need to stop then first occurence was found.
Of course it is possible, but you probable don't want to use this:
insert' l a = snd $ foldl (\(done, l') b -> if done then (True, l'++[b]) else if a<b then (False, l'++[b]) else (True, l'++[a,b])) (False, []) l

Cartesian product of infinite lists in Haskell

I have a function for finite lists
> kart :: [a] -> [b] -> [(a,b)]
> kart xs ys = [(x,y) | x <- xs, y <- ys]
but how to implement it for infinite lists? I have heard something about Cantor and set theory.
I also found a function like
> genFromPair (e1, e2) = [x*e1 + y*e2 | x <- [0..], y <- [0..]]
But I'm not sure if it helps, because Hugs only gives out pairs without ever stopping.
Thanks for help.
Your first definition, kart xs ys = [(x,y) | x <- xs, y <- ys], is equivalent to
kart xs ys = xs >>= (\x ->
ys >>= (\y -> [(x,y)]))
where
(x:xs) >>= g = g x ++ (xs >>= g)
(x:xs) ++ ys = x : (xs ++ ys)
are sequential operations. Redefine them as alternating operations,
(x:xs) >>/ g = g x +/ (xs >>/ g)
(x:xs) +/ ys = x : (ys +/ xs)
[] +/ ys = ys
and your definition should be good to go for infinite lists as well:
kart_i xs ys = xs >>/ (\x ->
ys >>/ (\y -> [(x,y)]))
testing,
Prelude> take 20 $ kart_i [1..] [101..]
[(1,101),(2,101),(1,102),(3,101),(1,103),(2,102),(1,104),(4,101),(1,105),(2,103)
,(1,106),(3,102),(1,107),(2,104),(1,108),(5,101),(1,109),(2,105),(1,110),(3,103)]
courtesy of "The Reasoned Schemer". (see also conda, condi, conde, condu).
another way, more explicit, is to create separate sub-streams and combine them:
kart_i2 xs ys = foldr g [] [map (x,) ys | x <- xs]
where
g a b = head a : head b : g (tail a) (tail b)
this actually produces exactly the same results. But now we have more control over how we combine the sub-streams. We can be more diagonal:
kart_i3 xs ys = g [] [map (x,) ys | x <- xs]
where -- works both for finite
g [] [] = [] -- and infinite lists
g a b = concatMap (take 1) a
++ g (filter (not . null) (take 1 b ++ map (drop 1) a))
(drop 1 b)
so that now we get
Prelude> take 20 $ kart_i3 [1..] [101..]
[(1,101),(2,101),(1,102),(3,101),(2,102),(1,103),(4,101),(3,102),(2,103),(1,104)
,(5,101),(4,102),(3,103),(2,104),(1,105),(6,101),(5,102),(4,103),(3,104),(2,105)]
With some searching on SO I've also found an answer by Norman Ramsey with seemingly yet another way to generate the sequence, splitting these sub-streams into four areas - top-left tip, top row, left column, and recursively the rest. His merge there is the same as our +/ here.
Your second definition,
genFromPair (e1, e2) = [x*e1 + y*e2 | x <- [0..], y <- [0..]]
is equivalent to just
genFromPair (e1, e2) = [0*e1 + y*e2 | y <- [0..]]
Because the list [0..] is infinite there's no chance for any other value of x to come into play. This is the problem that the above definitions all try to avoid.
Prelude> let kart = (\xs ys -> [(x,y) | ls <- map (\x -> map (\y -> (x,y)) ys) xs, (x,y) <- ls])
Prelude> :t kart
kart :: [t] -> [t1] -> [(t, t1)]
Prelude> take 10 $ kart [0..] [1..]
[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9),(0,10)]
Prelude> take 10 $ kart [0..] [5..10]
[(0,5),(0,6),(0,7),(0,8),(0,9),(0,10),(1,5),(1,6),(1,7),(1,8)]
you can think of the sequel as
0: (0, 0)
/ \
1: (1,0) (0,1)
/ \ / \
2: (2,0) (1, 1) (0,2)
...
Each level can be expressed by level n: [(n,0), (n-1, 1), (n-2, 2), ..., (0, n)]
Doing this to n <- [0..]
We have
cartesianProducts = [(n-m, m) | n<-[0..], m<-[0..n]]