From this grammar set I would like to construct a regular expression from it:
S -> bbD
D -> dD | dCbb
C -> cccC | cccE
E -> Eb | b
What I believe the regular expression should be:
(bb)(d+)(ccc+)(bbb+)
If this isnt correct, can someone point me in the right direction so I learn how to do it! Cheers.
You are wrong on the (ccc+) part. c must always occur in triples (either from cccC or cccE) but (ccc+) also allows cccc The rest seems correct on the first glimpse. Technically, the last part should be (b+bb) but that's of course equivalent to (bbb+)
Related
I'm trying to understand the equivalence between regular expressions α and β defined below, but I'm losing my mind over conflicting information.
a+b: a or b
ab: concatenation of a and b
$: empty string
α = (1*+0)+(1*+0)(0+1)*($+0+1)
β = (1*+0)(0+1)*($+0+1)
https://ivanzuzak.info/noam/webapps/regex_simplifier/ says, that α is equivalent to β.
My school however teaches that concatenation has stronger binding than union, meaning that:
11*+0 =/= 1(1*+0)
Which would mean that my α looks like this with parentheses:
α = (1*+0) + ( (1*+0)(0+1)*($+0+1) )
and that
α =/= ( (1*+0) + (1*+0) ) (0+1)*($+0+1)
I hope it's clear what my problem is, I'd appreciate any kind of help. Thanks.
Usually, two regular expressions are considered equivalent when they match the same set of words.
How they match it is not relevant. Therefore it doesn't matter which of the operators has greater precedence.
Note the subtle difference between being equal (in written form) and being equivalent (having the same effect).
Alright, it turns out that I have misunderstood why b+b <=> b.
It's that L1∪L2 <=> L2, if L1 is subset of L2.
I try to write regular expression to represent two of the same vowel in a row.
I know this code grep a, but how about e,i,o,u
(a[aeiou]{2})
Should I'write the codes as like that to grep tow of the same vowel?
(a[aeiou]{2}|i[aeiou]{2}|i[aeiou]{2}|o[aeiou]{2}|u[aeiou]{2})
You can simply use a group reference :
([aeiou])\1
See demo https://regex101.com/r/dI9kB9/1
Why not just do:
aa|ee|ii|oo|uu
The bar ( | ) is used for "or".
So this reads as:
aa OR ee OR ii OR oo OR uu
It is also known as "alternation".
See: http://www.regular-expressions.info/alternation.html
It has an example where you can search for dog|cat|mouse|fish, which I would read as "dog OR cat OR mouse OR fish".
I have following regular expression: ((abc)+d)|(ef*g?)
I have created a DFA (I hope it is correct) which you can see here
http://www.informatikerboard.de/board/attachment.php?attachmentid=495&sid=f4a1d32722d755bdacf04614424330d2
The task is to create a regular grammar (Chomsky hierarchy Type 3) and I don't get it. But I created a regular grammar, which looks like this:
S → aT
T → b
T → c
T → dS
S → eT
S → eS
T → ε
T → f
T → fS
T → gS
Best Regards
Patrick
Type 3 Chomsky are the class of regular grammars constricted to the use of following rules:
X -> aY
X -> a,
in which X is an arbitrary non-terminal and a an arbitrary terminal. The rule A -> eps is only allowed if A is not present in any of the right hand sides.
Construction
We notice the regular expression consists of two possibilities, either (abc)+d or ef*g?, our first rules will therefor be S -> aT and S -> eP. These rules allow us to start creating one of the two possibilities. Note that the non-terminals are necessarily different, these are completely different disjunct paths in the corresponding automaton. Next we continue with both regexes separately:
(abc)+
We have at least one sequence abc followed by 0 or more occurrences, it's not hard to see we can model this like this:
S -> aT
T -> bU
U -> cV
V -> aT # repeat pattern
V -> d # finish word
ef*g? Here we have an e followed by zero or more f characters and an optional g, since we already have the first character (one of the first two rules gave us that), we continue like this:
S -> eP
S -> e # from the starting state we can simply add an 'e' and be done with it,
# this is an accepted word!
P -> fP # keep adding f chars to the word
P -> f # add f and stop, if optional g doesn't occur
P -> g # stop and add a 'g'
Conclusion
Put these together and they will form a grammar for the language. I tried to write down the train of thought so you could understand it.
As an exercise, try this regex: (a+b*)?bc(a|b|c)*
I'm getting started with Haskell and I'm trying to use the Alex tool to create regular expressions and I'm a little bit lost; my first inconvenience was the compile part. How I have to do to compile a file with Alex?. Then, I think that I have to import into my code the modules that alex generates, but not sure. If someone can help me, I would be very greatful!
You can specify regular expression functions in Alex.
Here for example, a regex in Alex to match floating point numbers:
$space = [\ \t\xa0]
$digit = 0-9
$octit = 0-7
$hexit = [$digit A-F a-f]
#sign = [\-\+]
#decimal = $digit+
#octal = $octit+
#hexadecimal = $hexit+
#exponent = [eE] [\-\+]? #decimal
#number = #decimal
| #decimal \. #decimal #exponent?
| #decimal #exponent
| 0[oO] #octal
| 0[xX] #hexadecimal
lex :-
#sign? #number { strtod }
When we match the floating point number, we dispatch to a parsing function to operate on that captured string, which we can then wrap and expose to the user as a parsing function:
readDouble :: ByteString -> Maybe (Double, ByteString)
readDouble str = case alexScan (AlexInput '\n' str) 0 of
AlexEOF -> Nothing
AlexError _ -> Nothing
AlexToken (AlexInput _ rest) n _ ->
case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)
A nice consequence of using Alex for this regex matching is that the performance is good, as the regex engine is compiled statically. It can also be exposed as a regular Haskell library built with cabal. For the full implementation, see bytestring-lexing.
The general advice on when to use a lexer instead of a regex matcher would be that, if you have a grammar for the lexemes you're trying to match, as I did for floating point, use Alex. If you don't, and the structure is more ad hoc, use a regex engine.
Why do you want to use alex to create regular expressions?
If all you want is to do some regex matching etc, you should look at the regex-base package.
If it is plain Regex you want, the API is specified in text.regex.base. Then there are the implementations text.regex.Posix , text.regex.pcre and several others. The Haddoc documentation is a bit slim, however the basics are described in Real World Haskell, chapter 8. Some more indepth stuff is descriped in this SO question.
How can I validate linear equations with regular expressions or is there another way besides using regular expressions. I will use ^ to denote an exponent.
2x + 3 = 8 //This should validate fine
3x + 2y + 4z = 12 //This should validate fine
4x^2 + 2y = 22 //This should not validate because of the power.
4xy + 3y = 45 //This should not validate because of the product of two unknowns.
2/x + 4y = 22 //This should not validate because of the unknown in the denominator
(3/4)x + 3y + 2z = 40 //This should validate fine.
I'd start by writing a definition of a valid linear equation using Backus-Naur notation, with things like:
<integer> := <digit> | <integer> <digit>
<constant> := <integer> | ...
<variable> := <letter>
<term> := <constant> | <variable> | <constant> <variable>
and so on.
There are many ways to turn that into a validator. Having some experience with it, I'd use yacc or bison to write a parser that would only generate a parse tree if the input was a valid linear equation.
You might find regular expressions are too limited to do what you need - I just don't use them enough to know.
The cases you've mentioned are easy:
fail if /[xyz]\s*\^/;
fail if /\/\s*[xyz]/;
fail if /([xyz]\s*){2,}/;
(this is Perl syntax, assuming $_ contains the expression, and fail is whatever it is you do when you want to give up.)
Here you can replace xyz with whatever is a valid expression for one variable.
But in general this will require actual parsing of the expression, which is a job for lex/yacc or something like that, not a regular expression.
For example if "xy" is a legitimate variable name, then of course this all crumbles.