F# Regex matching chain - regex

As I am not completely happy with F#'s regex implementation for my usage, I wanted to implement a so-called regex chain. It basically works as follows:
The given string s will be checked, whether it matches the first pattern. If it does, it should execute a function associated with the first pattern. If it does not, it should continue with the next one.
I tried to implement it as follows:
let RegexMatch ((s : string, c : bool), p : string, f : GroupCollection -> unit) =
if c then
let m = Regex.Match(s, p)
if m.Success then
f m.Groups
(s, false)
else (s, c)
else (s, c)
("my input text", true)
|> RegexMatch("pattern1", fun g -> ...)
|> RegexMatch("pattern2", fun g -> ...)
|> RegexMatch("pattern3", fun g -> ...)
|> .... // more patterns
|> ignore
The problem is, that this code is invalid, as the forward-pipe operator does not seem to pipe tuples or does not like my implementation 'design'.
My question is: Can I fix this code above easily or should I rather implement some other kind of regex chain?

Your function RegexMatch won't support piping, because it has tupled parameters.
First, look at the definition of the pipe:
let (|>) x f = f x
From this, one can clearly see that this expression:
("text", true)
|> RegexMatch("pattern", fun x -> ...)
would be equivalent to this:
RegexMatch("pattern", fun x -> ...) ("text", true)
Does this match your function signature? Obviously not. In your signature, the text/bool pair comes first, and is part of the triple of parameters, together with pattern and function.
To make it work, you need to take the "piped" parameter in curried form and last:
let RegexMatch p f (s, c) = ...
Then you can do the piping:
("input", true)
|> RegexMatch "pattern1" (fun x -> ...)
|> RegexMatch "pattern2" (fun x -> ...)
|> RegexMatch "pattern3" (fun x -> ...)
As an aside, I must note that your approach is not very, ahem, functional. You're basing your whole logic on side effects, which will make your program not composable and hard to test, and probably prone to bugs. You're not reaping the benefits of F#, effectively using it as "C# with nicer syntax".
Also, there are actually well researched ways to achieve what you want. For one, check out Railway-oriented programming (also known as monadic computations).

To me this sounds like what you are trying to implement is Active Patterns.
Using Active Patterns you can use regular pattern matching syntax to match against RegEx patterns:
let (|RegEx|_|) p i =
let m = System.Text.RegularExpressions.Regex.Match (i, p)
if m.Success then
Some m.Groups
else
None
[<EntryPoint>]
let main argv =
let text = "123"
match text with
| RegEx #"\d+" g -> printfn "Digit: %A" g
| RegEx #"\w+" g -> printfn "Word : %A" g
| _ -> printfn "Not recognized"
0
Another approach is to use what Fyodor refers to as Railway Oriented Programming:
type RegexResult<'T> =
| Found of 'T
| Searching of string
let lift p f = function
| Found v -> Found v
| Searching i ->
let m = System.Text.RegularExpressions.Regex.Match (i, p)
if m.Success then
m.Groups |> f |> Found
else
Searching i
[<EntryPoint>]
let main argv =
Searching "123"
|> lift #"\d+" (fun g -> printfn "Digit: %A" g)
|> lift #"\w+" (fun g -> printfn "Word : %A" g)
|> ignore
0

Related

F# match pattern discriminator not defined issue

im in the process of writing a transposing recursive function and i have stopped at a problem. So i want to have a check using match by calling isTable function to verify that the input M is a valid table, however it errors and im not sure how to fix it
let isTable list =
match List.map List.length list |> List.distinct |> List.length with
| 1 -> true
| _ -> false
let rec transpose M =
match M with
| []::_ -> []
| (isTable M) -> [] // i want to check here if M is a valid table
| _ -> (List.map List.head M::transpose(List.map List.tail M))
error FS0039: The pattern discriminator 'isTable' is not defined.
Active patterns are one approach, but the overhead of adding one just for a single use is not worth it. An easy and uncluttered solution would be to use a when clause:
let rec transpose M =
match M with
| []::_ -> []
| _ when isTable M -> []
| _ -> (List.map List.head M::transpose(List.map List.tail M))
None of the answers yet show how to turn your case into an Active Pattern. This is particularly useful for (1) readability and (2) reusability of code. Assuming you'd need isTable more than once, this can be beneficial.
/// Active pattern, must start with capital letter.
let (|IsTable|_|) list =
match List.map List.length list |> List.distinct with
| [_] -> Some list
| _ -> None
let rec transpose M =
match M with
| []::_ -> []
| IsTable M -> [] // using the active pattern
| _ ->
List.map List.head M::transpose(List.map List.tail M)
As an aside, your isTable function matched over List.length result. A List.length iterates over the whole list and is O(n). Since we're only interested if the result is one item, the above approach will be more efficient, removing at least one iteration from the code.
Try something like
let rec transpose M =
match M with
| []::_ -> []
| _ -> match (isTable M) with
| true - > [] // i want to check here if M is a valid table
| _ -> (List.map List.head M::transpose(List.map List.tail M))
As a matter of programming style I'd recommend adding a data constructor like Table so that you can match on it but this should get things working.

how to implement lambda-calculus in OCaml?

In OCaml, it seems that "fun" is the binding operator to me. Does OCaml have built-in substitution? If does, how it is implemented? is it implemented using de Bruijn index?
Just want to know how the untyped lambda-calculus can be implemented in OCaml but did not find such implementation.
As Bromind, I also don't exactly understand what you mean by saying "Does OCaml have built-in substitution?"
About lambda-calculus once again I'm not really understand but, if you talking about writing some sort of lambda-calculus interpreter then you need first define your "syntax":
(* Bruijn index *)
type index = int
type term =
| Var of index
| Lam of term
| App of term * term
So (λx.x) y will be (λ 0) 1 and in our syntax App(Lam (Var 0), Var 1).
And now you need to implement your reduction, substitution and so on. For example you may have something like this:
(* identity substitution: 0 1 2 3 ... *)
let id i = Var i
(* particular case of lift substitution: 1 2 3 4 ... *)
let lift_one i = Var (i + 1)
(* cons substitution: t σ(0) σ(1) σ(2) ... *)
let cons (sigma: index -> term) t = function
| 0 -> t
| x -> sigma (x - 1)
(* by definition of substitution:
1) x[σ] = σ(x)
2) (λ t)[σ] = λ(t[cons(0, (σ; lift_one))])
where (σ1; σ2)(x) = (σ1(x))[σ2]
3) (t1 t2)[σ] = t1[σ] t2[σ]
*)
let rec apply_subs (sigma: index -> term) = function
| Var i -> sigma i
| Lam t -> Lam (apply_subs (function
| 0 -> Var 0
| i -> apply_subs lift_one (sigma (i - 1))
) t)
| App (t1, t2) -> App (apply_subs sigma t1, apply_subs sigma t2)
As you can see OCaml code is just direct rewriting of definition.
And now small-step reduction:
let is_value = function
| Lam _ | Var _ -> true
| _ -> false
let rec small_step = function
| App (Lam t, v) when is_value v ->
apply_subs (cons id v) t
| App (t, u) when is_value t ->
App (t, small_step u)
| App (t, u) ->
App (small_step t, u)
| t when is_value t ->
t
| _ -> failwith "You will never see me"
let rec eval = function
| t when is_value t -> t
| t -> let t' = small_step t in
if t' = t then t
else eval t'
For example you can evaluate (λx.x) y:
eval (App(Lam (Var 0), Var 1))
- : term = Var 1
OCaml does not perform normal-order reduction and uses call-by-value semantics. Some terms of lambda calculus have a normal form than cannot be reached with this evaluation strategy.
See The Substitution Model of Evaluation, as well as How would you implement a beta-reduction function in F#?.
I don't exactly understand what you mean by saying "Does OCaml have built-in substitution? ...", but concerning how the lambda-calculus can be implemented in OCaml, you can indeed use fun : just replace all the lambdas by fun, e.g.:
for the church numerals: you know that zero = \f -> (\x -> x), one = \f -> (\x -> f x), so in Ocaml, you'd have
let zero = fun f -> (fun x -> x)
let succ = fun n -> (fun f -> (fun x -> f (n f x)))
and succ zero gives you one as you expect it, i.e. fun f -> (fun x -> f x) (to highlight it, you can for instance try (succ zero) (fun s -> "s" ^ s) ("0") or (succ zero) (fun s -> s + 1) (0)).
As far as I remember, you can play with let and fun to change the evaluation strategy, but to be confirmed...
N.B.: I put all parenthesis just to make it clear, maybe some can be removed.

Extracting data from a tuple in OCaml

I'm trying to use the CIL library to parse C source code. I'm searching for a particular function using its name.
let cil_func = Caml.List.find (fun g ->
match g with
| GFun(f,_) when (equal f.svar.vname func) -> true
| _ -> false
) cil_file.globals in
let body g = match g with GFun(f,_) -> f.sbody in
dumpBlock defaultCilPrinter stdout 1 (body cil_func)
So I have a type GFun of fundec * location, and I'm trying to get the sbody attribute of fundec.
It seems redundant to do a second pattern match, not to mention, the compiler complains that it's not exhaustive. Is there a better way of doing this?
You can define your own function that returns just the fundec:
let rec find_fundec fname = function
| [] -> raise Not_found
| GFun (f, _) :: _ when equal (f.svar.vname fname) -> f (* ? *)
| _ :: t -> find_fundec fname t
Then your code looks more like this:
let cil_fundec = find_fundec func cil_file.globals in
dumpBlock defaultCilPrinter stdout 1 cil_fundec.sbody
For what it's worth, the line marked (* ? *) looks wrong to me. I don't see why f.svar.vname would be a function. I'm just copying your code there.
Update
Fixed an error (one I often make), sorry.

On defining list length in terms of fold

This is just an exercise (I realize that the functions mentioned below are already implemented in List).
Suppose that I have an interface that includes the following lines
val length : 'a list -> int
val fold : init:'acc -> f:('acc -> 'a -> 'acc) -> 'a list -> 'acc
...and that I implement fold like this:
let rec fold ~init ~f l =
match l with
| [] -> init
| h :: t -> fold ~init:(f init h) ~f:f t
I expected to now be able to implement length like this
let length = fold ~init:0 ~f:(fun c _ -> (c + 1))
...but the compiler complains with
Values do not match:
val length : '_a list -> int
is not included in
val length : 'a list -> int
Of course, I know that I can implement length like this
let length l = fold ~init:0 ~f:(fun c _ -> (c + 1)) l
...but I don't understand why I can't remove the trailing l from both sides of the =.
Where am I going wrong?
This is the value restriction. Your definition of length is not a value in a very technical sense. There are some good discussions of the issue already here on Stack Overflow. I'll look for a good one.
Here is a pretty good one:
Why does a partial application have value restriction?

Compose total and partial functions

I can't wrap my head around where should I put parenthesis to get it working:
let read_lines filename =
let channel = open_in filename in
Std.input_list channel;;
let print_lines filename =
List.map print_string ((^) "\n") (read_lines filename);;
^ This is the closes I've got so far. If my terminology is vague: ((^) "\n") is what I call partial function (well, because it doesn't handle all of its arguments). print_string I call total function because... well, it handles all of its arguments.
Obviously, what I would like to happen is that:
List.map applies first ((^) "\n") to the element of the list.
List.map applies print_string to the result of #1.
How? :)
Maybe you want something like that?
# let ($) f g = fun x -> f(g x);;
val ( $ ) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
# let f = print_string $ (fun s -> s^"\n");;
val f : string -> unit = <fun>
# List.iter f ["a";"b";"c";"d"];;
a
b
c
d
- : unit = ()
# let g = string_of_int $ ((+)1) $ int_of_string;;
val g : string -> string = <fun>
# g "1";;
- : string = "2"
Your code didn't work because missing parenthesis:
List.map print_string ((^) "\n") xs
is parsed as
(List.map print_string ((^) "\n")) xs
when you expected
List.map (print_string ((^) "\n")) xs
A few things: List.map is probably not what you want, since it will produce a list (of unit values) rather than just iterating. ((^) "\n") is probably also not what you want, as it prepends a newline, the "\n" being the first argument. (This is not a section as in Haskell, but a straightforward partial application.)
Here's a reasonable solution that is close to what (I think) you want:
let print_lines filename =
List.iter (fun str -> print_string (str ^ "\n")) (read_lines filename)
But I would rather write
let print_lines filename =
List.iter (Printf.printf "%s\n") (read_lines filename)
Which is both clearer and more efficient.