Regular Grammar to my Regex/DFA - regex

I have following regular expression: ((abc)+d)|(ef*g?)
I have created a DFA (I hope it is correct) which you can see here
http://www.informatikerboard.de/board/attachment.php?attachmentid=495&sid=f4a1d32722d755bdacf04614424330d2
The task is to create a regular grammar (Chomsky hierarchy Type 3) and I don't get it. But I created a regular grammar, which looks like this:
S → aT
T → b
T → c
T → dS
S → eT
S → eS
T → ε
T → f
T → fS
T → gS
Best Regards
Patrick

Type 3 Chomsky are the class of regular grammars constricted to the use of following rules:
X -> aY
X -> a,
in which X is an arbitrary non-terminal and a an arbitrary terminal. The rule A -> eps is only allowed if A is not present in any of the right hand sides.
Construction
We notice the regular expression consists of two possibilities, either (abc)+d or ef*g?, our first rules will therefor be S -> aT and S -> eP. These rules allow us to start creating one of the two possibilities. Note that the non-terminals are necessarily different, these are completely different disjunct paths in the corresponding automaton. Next we continue with both regexes separately:
(abc)+
We have at least one sequence abc followed by 0 or more occurrences, it's not hard to see we can model this like this:
S -> aT
T -> bU
U -> cV
V -> aT # repeat pattern
V -> d # finish word
ef*g? Here we have an e followed by zero or more f characters and an optional g, since we already have the first character (one of the first two rules gave us that), we continue like this:
S -> eP
S -> e # from the starting state we can simply add an 'e' and be done with it,
# this is an accepted word!
P -> fP # keep adding f chars to the word
P -> f # add f and stop, if optional g doesn't occur
P -> g # stop and add a 'g'
Conclusion
Put these together and they will form a grammar for the language. I tried to write down the train of thought so you could understand it.
As an exercise, try this regex: (a+b*)?bc(a|b|c)*

Related

(Ocaml) Using 'match' to extract list of chars from a list of chars

I have just started to learn ocaml and I find it difficult to extract small list of chars from a bigger list of chars.
lets say I have:
let list_of_chars = ['#' ; 'a' ; 'b' ; 'c'; ... ; '!' ; '3' ; '4' ; '5' ];;
I have the following knowledge - I know that in the
list above I have '#' followed by a '!' in some location further in the list .
I want to extract the lists ['a' ;'b' ;'c' ; ...] and ['3' ; '4' ; '5'] and do something with them,
so I do the following thing:
let variable = match list_of_chars with
| '#'::l1#['!']#l2 -> (*[code to do something with l1 and l2]*)
| _ -> raise Exception ;;
This code doesn't work for me, it's throwing errors. Is there a simple way of doing this?
(specifically for using match)
As another answer points out, you can’t use pattern matching for this because pattern matching only lets you use constructors and # is not a constructor.
Here is how you might solve your problem
let split ~equal ~on list =
let rec go acc = function
| [] -> None
| x::xs -> if equal x on then Some (rev acc, xs) else go (x::acc) xs
in
go [] list
let variable = match list_of_chars with
| '#'::rest ->
match split rest ~on:'!' ~equal:(Char.equal) with
| None -> raise Exception
| Some (left,right) ->
... (* your code here *)
I’m now going to hypothesise that you are trying to do some kind of parsing or lexing. I recommend that you do not do it with a list of chars. Indeed I think there is almost never a reason to have a list of chars in ocaml: a string is better for a string (a chat list has an overhead of 23x in memory usage) and while one might use chars as a kind of mnemonic enum in C, ocaml has actual enums (aka variant types or sum types) so those should usually be used instead. I guess you might end up with a chat list if you are doing something with a trie.
If you are interested in parsing or lexing, you may want to look into:
Ocamllex and ocamlyacc
Sedlex
Angstrom or another parser generator like it
One of the regular expression libraries (eg Re, Re2, Pcre (note Re and Re2 are mostly unrelated)
Using strings and functions like lsplit2
# is an operator, not a valid pattern. Patterns need to be static and can't match a varying number of elements in the middle of a list. But since you know the position of ! it doesn't need to be dynamic. You can accomplish it just using :::
let variable = match list_of_chars with
| '#'::a::b::c::'!'::l2 -> let l1 = [a;b;c] in ...
| _ -> raise Exception ;;

Converting Grammar to a regular expression

From this grammar set I would like to construct a regular expression from it:
S -> bbD
D -> dD | dCbb
C -> cccC | cccE
E -> Eb | b
What I believe the regular expression should be:
(bb)(d+)(ccc+)(bbb+)
If this isnt correct, can someone point me in the right direction so I learn how to do it! Cheers.
You are wrong on the (ccc+) part. c must always occur in triples (either from cccC or cccE) but (ccc+) also allows cccc The rest seems correct on the first glimpse. Technically, the last part should be (b+bb) but that's of course equivalent to (bbb+)

Ocaml record matching

Given a basic record
type t = {a:string;b:string;c:string}
why does this code compile
let f t = match t with
{a;b;_} -> a
but this
let f t = match t with
{_;b;c} -> b
and
let f t = match t with
{a;_;c} -> c
does not? I'm asking this out of curiosity thus the obvious useless code examples.
The optional _ field must be the last field. This is documented as a language extension in Section 7.2
Here's the production for reference:
pattern ::= ...
∣ '{' field ['=' pattern] { ';' field ['=' pattern] } [';' '_' ] [';'] '}'
Because the latter two examples are syntactically incorrect. The syntax allows you to terminate your field name pattern with the underscore to notify the compiler that you're aware, that there are more fields than you are trying to match. It is used to suppress a warning (that is disabled by default). Here is what the OCaml manual says about it:
Optionally, a record pattern can be terminated by ; _ to convey the fact that not all fields of the record type are listed in the record pattern and that it is intentional. By default, the compiler ignores the ; _ annotation. If warning 9 is turned on, the compiler will warn when a record pattern fails to list all fields of the corresponding record type and is not terminated by ; _. Continuing the point example above,
If you want to match to a name without binding it to a variable, then you should use the following syntax:
{a=_; b; c}
E.g.,
let {a=_; b; c} = {a="hello"; c="cruel"; b="world"};;
val b : string = world
val c : string = cruel
To add to the answers by Jeffrey Scofield and ivg, what the erroneous examples are trying to achieve can in fact be achieved by using a different order of fields. Like so:
let f t = match t with
{b;c;_} -> b

Ambiguity in transition: How to process string in NFA?

I have made DFA from a given regular expression to match the test string. There are some cases in which .* occurs. ( for example .*ab ) . Let say now the machine is in state 1. In the DFA, .* refers to the transition for all the characters to itself and another transition for a from the state 1 for 'a'. If test string contains 'a' then what could be the transition because from state 1, machine can go to two states that is not possible in DFA.
I start with fundamental with your example so that one can find it helpful
Any class of automata can have two forms:
Deterministic
Non-Deterministic.
In Deterministic model: we only have single choice (or say no choice) to move from one congratulation to next configuration.
In Deterministic model of Finite Automate (DFA): for every possible combination of state (Q) and language symbol (Σ), we always have unique next state.
Definition of transition function for DFA: δ:Q×Σ → Q
δ(q0, a) → q1
^ only single choice
So, In DFA every possible move is definite from one state to next state.
Whereas,
In Non-Deterministic model: we can have more than one choice for next configuration.
And in Non-deterministaic model of Finite Automata (NFA): output is set of states for some combination of state (Q) and language symbol (Σ).
Definition of transition function for NFA: δ:Q×Σ → 2Q = ⊆ Q
δ(q0, a) → {q1, q2, q3}
^ is set, Not single state (more than one choice)
In NFA, we can have more then one choice for next state. That is you calls ambiguity in transition NFA.
(your example)
Suppose language symbols are Σ = {a, b} and the language/regular expression is (a + b)*ab. The finite automata for this language you down might be probably like below:
Your question is: Which state to move when we have more than one choices for next state?
I make it more general question.
How to process string in NFA?
I am considering automata model as an acceptor that accept a string if it belong to the language of automata.(Notice: we can have an automaton as a transducer), below is my answer with an above example
In above NFA, we find 5 tapular objects:
1. Σ : {a, b}
2. Q : {q1, ,q2, q3}
3. q1: initial state
4. F : {q3} <---F is set of final state
5. δ : Transition rules in above diagram:
δ(q1, a) → { q1, q2 }
δ(q1, b) → { q1 }
δ(q2, b) → { q3 }
The exampled finite automata is an actually an NFA because in production rule δ(q1, a) → { q1, q2 }, if we get a symbol while present state is q1 then next states can be either q1 or q2 (more than one choices). So when we process a string in NFA, we get extra path to travel wherever their is a symbol a to be process while current state is q1.
A string is accepted by an NFA, if there is some sequence of possible moves that will put the machine in a final state at the end of string processing. And the set of all string those have some path to reach to any final state in set F from initial state is called language of NFA:
We can also write, "what is language defined by a NFA?" as:
L(nfa) = { w ⊆ Σ* | δ*(q1, w) ∩ F ≠ ∅}
when I was new, this was too complex to understand to me but its really not
L(nfa) says: all strings consists of language symbols = (w ⊆ Σ* ) are in language; if (|) the set of states get after processing of w form initial state (=δ*(q1, w) ) contains some states in the set of Final states (hence intersection with final states is not empty = δ*(q1, w) ∩ F ≠ ∅). So while processing a string in Σ*, we need to keep track of all the possible paths.
Example-1: to process string abab though above NFS:
--►(q1)---a---►(q1)---b---►(q1)---a---►(q1)---b---►(q1)
\ \
a a
\ \
▼ ▼
(q2) (q2)---b---►((q3))
|
b
|
▼
(q3)
|
a
|
halt
Above diagram show: How to process a string abab in NFA?
A halt: means string could not process completely so it can't be consider a accepted string in this path
String abab could process completely in two directions so δ*(q1, w) = { q1, q3}.
and intersection of δ*(q1, w) with set of final states is {q3}:
{q1, q3} ∩ F
==> {q1, q3} ∩ {q3}
==> {q3} ≠ ∅
In this way, string ababa is in language L(nfa).
Example-2: String from Σ* is abba and following is how to process:
--►(q1)---a---►(q1)---b---►(q1)---b---►(q1)---a---►(q1)
\ \
a a
\ \
▼ ▼
(q2) (q2)
|
b
|
▼
(q3)
|
b
|
halt
For string abba set of reachable states is δ*(q1, w) = { q1, q2} and no state is final state in this set this implies => its intersection with F is ∅ a empty set, hence string abba is not an accepted string (and not in language).
This is the way we process a string in Non-deterministic Finite Automata.
Some additional important notes:
In case of finite automata's both Deterministic and Non-Deterministic models are equally capable. Non-Deterministic model doesn't have extra capability to define a language.
hence scope of NFA and DFA are same that is Regular Language. (this is not case for all class of automate for example scope of PDA !=NPDA)
Non-deterministic models are more useful for theoretical purpose, comparatively essay to draw. Whereas for implementation purpose we always desire deterministic model (minimized for efficiency). And fortunately in class of finite autometa every Non-deterministic model can be converted into an equivalent Deterministic one. We have algorithmic method to convert an NFA into DFA.
An information represented by a single state in DFA, can be represented by a combination of NFA states, hence number of states in NFA are less than their equivalent DFA. (proof are available numberOfStates(DFA)<= 2 power numberOfStates(NFA) as all set combinations are powerset)
The DFA for above regular language is as below:
Using this DFA you will always find a unique path from initial state to final state for any string in Σ* and instead of set you will gets to a single reachable final state and if that state belongs to set of final that input string is said to be accepted string (in language) otherwise not/
(your expression .*ab and (a + b)*ab are same usually in theoretical science we don't use . dot operator other then concatenation)
Matches with such regular expressions happen via backtracking. When there is an ambiguity about the next state, the evaluation takes the first choice and remembers it made the choice. If taking the first choice results in a failure to match, the evaluation backtracks to the last choice it made and tries the next available choice from that state.
I'm not sure such a mechanism maps to a strict DFA.

F# Mapping Regular Expression Matches with Active Patterns

I found this useful article on using Active Patterns with Regular Expressions:
http://www.markhneedham.com/blog/2009/05/10/f-regular-expressionsactive-patterns/
The original code snippet used in the article was this:
open System.Text.RegularExpressions
let (|Match|_|) pattern input =
let m = Regex.Match(input, pattern) in
if m.Success then Some (List.tl [ for g in m.Groups -> g.Value ]) else None
let ContainsUrl value =
match value with
| Match "(http:\/\/\S+)" result -> Some(result.Head)
| _ -> None
Which would let you know if at least one url was found and what that url was (if I understood the snippet correctly)
Then in the comment section Joel suggested this modification:
Alternative, since a given group may
or may not be a successful match:
List.tail [ for g in m.Groups -> if g.Success then Some g.Value else None ]
Or maybe you give labels to your
groups and you want to access them by
name:
(re.GetGroupNames()
|> Seq.map (fun n -> (n, m.Groups.[n]))
|> Seq.filter (fun (n, g) -> g.Success)
|> Seq.map (fun (n, g) -> (n, g.Value))
|> Map.ofSeq)
After trying to combine all of this I came up with the following code:
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let (|Match|_|) pattern input =
let re = new Regex(pattern)
let m = re.Match(input) in
if m.Success then Some ((re.GetGroupNames()
|> Seq.map (fun n -> (n, m.Groups.[n]))
|> Seq.filter (fun (n, g) -> g.Success)
|> Seq.map (fun (n, g) -> (n, g.Value))
|> Map.ofSeq)) else None
let GroupMatches stringToSearch =
match stringToSearch with
| Match "(http:\/\/\S+)" result -> printfn "%A" result
| _ -> ()
GroupMatches testString;;
When I run my code in an interactive session this is what is output:
map [("0", "http://www.bob.com"); ("1", "http://www.bob.com")]
The result I am trying to achieve would look something like this:
map [("http://www.bob.com", 2); ("http://www.b.com", 1); ("http://www.bill.com", 1);]
Basically a mapping of each unique match found followed by the count of the number of times that specific matching string was found in the text.
If you think I'm going down the wrong path here please feel free to suggest a completely different approach. I'm somewhat new to both Active Patterns and Regular Expressions so I have no idea where to even begin in trying to fix this.
I also came up with this which is basically what I would do in C# translated to F#.
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let matches =
let matchDictionary = new Dictionary<string,int>()
for mtch in (Regex.Matches(testString, "(http:\/\/\S+)")) do
for m in mtch.Captures do
if(matchDictionary.ContainsKey(m.Value)) then
matchDictionary.Item(m.Value) <- matchDictionary.Item(m.Value) + 1
else
matchDictionary.Add(m.Value, 1)
matchDictionary
Which returns this when run:
val matches : Dictionary = dict [("http://www.bob.com", 2); ("http://www.b.com", 1); ("http://www.bill.com", 1)]
This is basically the result I am looking for, but I'm trying to learn the functional way to do this, and I think that should include active patterns. Feel free to try to "functionalize" this if it makes more sense than my first attempt.
Thanks in advance,
Bob
Interesting stuff, I think everything you are exploring here is valid. (Partial) active patterns for regular expression matching work very well indeed. Especially when you have a string which you want to match against multiple alternative cases. The only thing I'd suggest with the more complex regex active patterns is that you give them more descriptive names, possibly building up a collection of different regex active patterns with differing purposes.
As for your C# to F# example, you can have functional solution just fine without active patterns, e.g.
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let matches input =
Regex.Matches(input, "(http:\/\/\S+)")
|> Seq.cast<Match>
|> Seq.groupBy (fun m -> m.Value)
|> Seq.map (fun (value, groups) -> value, (groups |> Seq.length))
//FSI output:
> matches testString;;
val it : seq<string * int> =
seq
[("http://www.bob.com", 2); ("http://www.b.com", 1);
("http://www.bill.com", 1)]
Update
The reason why this particular example works fine without active patterns is because 1) you are only testing one pattern, 2) you are dynamically processing the matches.
For a real world example of active patterns, let's consider a case where 1) we are testing multiple regexes, 2) we are testing for one regex match with multiple groups. For these scenarios, I use the following two active patterns, which are a bit more general than the first Match active pattern you showed (I do not discard first group in the match, and I return a list of the Group objects, not just their values -- one uses the compiled regex option for static regex patterns, one uses the interpreted regex option for dynamic regex patterns). Because the .NET regex API is so feature filled, what you return from your active pattern is really up to what you find useful. But returning a list of something is good, because then you can pattern match on that list.
let (|InterpretedMatch|_|) pattern input =
if input = null then None
else
let m = Regex.Match(input, pattern)
if m.Success then Some [for x in m.Groups -> x]
else None
///Match the pattern using a cached compiled Regex
let (|CompiledMatch|_|) pattern input =
if input = null then None
else
let m = Regex.Match(input, pattern, RegexOptions.Compiled)
if m.Success then Some [for x in m.Groups -> x]
else None
Notice also how these active patterns consider null a non-match, instead of throwing an exception.
OK, so let's say we want to parse names. We have the following requirements:
Must have first and last name
May have middle name
First, optional middle, and last name are separated by a single blank space in that order
Each part of the name may consist of any combination of at least one or more letters or numbers
Input may be malformed
First we'll define the following record:
type Name = {First:string; Middle:option<string>; Last:string}
Then we can use our regex active pattern quite effectively in a function for parsing a name:
let parseName name =
match name with
| CompiledMatch #"^(\w+) (\w+) (\w+)$" [_; first; middle; last] ->
Some({First=first.Value; Middle=Some(middle.Value); Last=last.Value})
| CompiledMatch #"^(\w+) (\w+)$" [_; first; last] ->
Some({First=first.Value; Middle=None; Last=last.Value})
| _ ->
None
Notice one of the key advantages we gain here, which is the case with pattern matching in general, is that we are able to simultaneously test that an input matches the regex pattern, and decompose the returned list of groups if it does.