I have a grammar defined as follows:
A -> aA*b | empty_string
Is A a regular expression? I'm confused as to how to interpret a BNF grammar.
No, this question doesn't actually have to do with regular expressions. Context-free grammars specify languages that can't be described by regular expressions.
Here, A is a non-terminal; that is, it's a symbol that must be expanded by a production rule. Given that you only have one rule, it is also your start symbol - any production in this grammar must start with an A.
The production rule is
(1) A -> aA*b |
(2) empty_string
a and b are terminal symbols - they are in the alphabet of the language, and cannot be expanded. When you have no more nonterminals on the left-hand side, you are done.
So this language specifies words that are like balanced parentheses, except with a and b instead of ( and ).
For instance, you could produce ab as follows:
A -> aA*b (using 1)
aAb -> ab (using 2)
Similarly, you could produce aabb:
A -> aA*b (1)
aAb -> aaA*bb (1)
aaAbb -> aabb (2)
Or even aababb:
A -> aA*b
aA*b -> aabA*b:
aaba*b -> aababA*b:
aababA*b: -> aababb
Get it? The star symbol may be a bit confusing because you have seen it in regular expressions, but actually it does the same thing here as there. It is called a Kleene-closure and it represents all words you can make with 0 or more As.
Regular Expressions generate Regular languages and can be parsed with State Machines.
BNF grammars are Context Free Grammars which generate Context Free languages and can be be parsed with Push Down Automata (stack machines)
Context Free Grammars can do everything Regular Grammars can and more.
A appears to be a BNF grammar rule. I'm not really sure why you have this confused with a regular expression. Are you confused because it has a * in it? Everything that has a * isn't a regular expression.
Related
I have this context free grammar :
S -> aSb
S -> aSa
S -> bSa
S -> bSb
S -> epsilon
I want to show that this grammar describes a regular language ( namely can be represented as a regular expression) but I'm not sure how to do that and get the confident I'm not missing any pattern.
I did not see this exact question and that why I don't think it is duplicate. I'd like an explanation on this relative simple example. It was hard for me to follow more complicated examples.
You must build a DFA or regular expression. The DFA will have 2 states in this case, I think. q1(even) move to q2(odd) after a, b and from q2 move to q1 after a, b. The start and accepting state is q1.
Regular expressions are a standard tool used for parsing strings across many languages. However their scope of applicability is limited. Regular expressions can only match a list. There is no way to describe arbitrary deep nested structures using regular expressions. Question: what is a technology/framework as widely used/spread and as standatd as regular expessions are that can match tree structures (produce AST).
Regular expressions describe a finite-state automaton.
Since the late 1960's, the "bread and butter" of parsing (though not necessarily the "state of the art") has been push-down automata generated by parser generators according to "LR" algorithms like LALR(1).
The connection to regular expressions is this: the parsing machine does in fact use rules very similar to regular expressions in order to recognize viable prefixes. The "shift" state transitions among the "core LR(0) items" constitute a finite automaton, and can be described by a regular expression. The recursion is is handled thanks to the semantic action of pushing symbols onto a stack when doing the "shifts", and removing them ("reducing"). Reductions rewrite a portion of the stack, and perform a "goto" to another state. This type of goto, together with the stack, is absent in the regular expression automaton.
Parse Expression Grammars are also related to regular expressions. Regular expressions themselves can be endowed with recursion. Firstly, we can take pieces of regular expressions and give them names, and then construct bigger regular expressions by writing expressions which invoke these names. (Such as feature is found in the lex tool where you can define a named expressions like letters [A-Za-z]+ and refer to it later as {letters}. Now suppose you allow circular references, like letters [A-Za-z]{letters}?. You now have recursion; the only problem is to adjust the model in order to implement it.
Implementations of so-called "regular expressions" in various modern languages and environments in fact support recursion. Perl-compatible regular expressions (PCRE) support it, for instance.
Expressions that feature recursion or backreferencing are not handled by the classic NFA compilation route (possibly converted to a DFA); they cannot be.
How the above letters recursion can be handled is with actual recursion. The ? operator can be implemented as a function which tries to match its respective argument object. If it succeeds, then it consumes whatever it has matched, otherwise it consumes nothing. That is to say, the regular expression can be converted to a syntax tree, and interpreted "as is" rather than compiled to a state machine (or trivially compiled to functions corresponding to the nodes of the tree), and such interpretation can naturally handle recursion. The interpretation then constitutes, effectively, a syntax-directed recursive-descent parser. (Note how I avoided left recursion in defining letters to make that example compatible with this approach).
Example: parenthesis-matching regex:
par-match := ({par-match})|
This gets compiled to a tree:
branch-op <-- "par-match" name points at this node
/ \
catenate-op <empty>
/ \
"(" catenate-op
/ \
{par-match} ")"
This can then converted to a recursive descent parser, or interpreted directly.
Pattern matching starts by invoking the top-level "branch-op". This operator simply tries all of the alternatives. Suppose the input is empty. Then the left alternative will fail: it demands an open parenthesis. So then the right alternative will succeed: empty matches empty. (The operators either "fail" or indicate "success" and consume input.)
But suppose your input is (()). The left catenate-op will in turn invoke its left subtree, which matches and consumes the left parenthesis, leaving ()). It will then invoke its right subtree, another catenate-op. This catenate-op matches its left subtree, which triggers recursion into the top level via the named par-match references. That recursion will match and consume (), leaving ). The catenate-op then invokes its right subtree which matches ). Control returns up to branch-op. (Though the left side of branch-op matched something, branch-op must still try the other alternative; more than one branch can match, and some can match longer than others.)
This is closely related to Parsing Expression Grammars work.
Practically speaking, the recursive definition could be encoded into the regex syntax somehow. Say we invent some new operator like (?name:definition) which means "match definition which is allowed to contain invocations of itself via name. The invocation syntax could be (*name), so that we can write the par-match example as (?par-match:\((*par-match)\)|). The combinations (? and (* are invalid under "classic" regex syntax and so we can use them for extension.
As a final note, regexes correspond to grammars. That is the fundamental connection btween regexes and parsing. That is to say, regexes correspond to a particular subset of grammars describe only regular languages. An example of a grammar which describes a regular language:
S -> A | B
B -> b
A -> A a | c
Although there is A -> A ... recursion, this is still regular, and corresponds to the regex ac*|b, which is just a more compact way to denote the same language. The grammar lets us notate languages that aren't regular and for which we can't write a regex, but as we have seen, we can extend the regex notation and semantics to express some of these things. Regular expressions aren't separate from grammars. The two aren't counterparts, but rather one is a special case or subset of the other.
Parser generators like Yacc, Bison, and derivatives are what you're after. They aren't as widespread as regular expressions because they generate actual C code. There are translations like Jison for example which implement the Yacc/Bison syntax using javascript. I know there are similar tools for other languages.
I get the impression Parsing expression grammar systems are up and coming though.
The regular expression is (a+)+ . Using an NFA this would give reDOS attacks for longer strings . What would be the equivalent grammar for this regular expression ?
Now i was trying to determine the grammar in multiple steps .
a+ would translate to
S -> a
S -> aS
(a+)+ would translate to
G -> S
G -> SG
I was not sure how to simplify further whether it would be CFG or CSG ? Any suggestions would be of great help
(a+)+ is equivalent to a+. A possible grammar is
S → A
A → a
A → aA
The grammar is regular (such one must exist, because it's derived from a regular expression). It's therefore also context-free and therefore also context-sensitive.
I'm reading finite automata & grammars from the compiler construction of Aho and I'm stuck with this grammar for so long. I don't have a clear perception of how I can describe it:
Consider the following Grammar:
S -> (L) | a L -> L, S | S
Note that the parentheses and comma are actually terminals in this
language and appear in the sentences accepted by this grammar. Try to
describe the language generated by this grammar. Is this grammar
ambiguous?
My concern here is: Can language generated by this grammar be described as regular expressions? I'm confused about how to do it. Any help?
To show that the grammar is ambiguous, you need to be able to construct two different parse trees while parsing the same string. Your string will be comprised of "(", ")", ",", and "a", since those are the only terminal symbols in the grammar.
Try arranging those 4 terminal symbols in a few ways and see if you can show different successful parses, in the spirit of the example ambiguous grammar on Wikipedia.
Immediate left recursion tends to cause problems for some parsers. See if "a,a,a" does anything interesting on "L → L , S | S"...
my concern here is language generated by this grammar as regular expression can it be described...i'am confused about how to do
A regular expression can not fully describe the grammar. Rewriting part of the grammar will make this more apparent:
S → ( L )
S → a
L → L , S
L → S
Pay attention to #1 and #4. L can produce S, and S can produce ( L ). This means S can produce ( S ), which can produce ( ( S ) ), ( ( ( S ) ) ), etc. ad infinitum. The key thing is that those parentheses are matched; there are the same amount of "(" symbols as ")" symbols.
A regex can't do that.
Regular expressions map to finite automata. Finite automata can not count. A language L ∈ {w: 0n 1n} is not a regular. L ∈ {w: (n )n}, just being a substiution of "(" for "0" and ")" for "1", isn't either. See: the first examples section under Regular Languages - Wikipedia. (Notation note: s1 is s, s2 is ss, ..., sn is s repeated n times.)
This means you can't use a regex to describe that part of the language. That puts it in the domain of CFGs, Turing Machines, and pushdown automata.
Regular expressions (and a library to interpret them) are a poor tool for recognizing sentences of a context-free grammar. Instead, you would want to use a parser generator like yacc, bison, or ANTLR.
I think the point of the exercise in Aho's book is to "describe the language" in words, in order to understand whether it is ambiguous. One way to approach it: can you devise a grammatical sentence that can be parsed in two different ways, given the productions of the grammar? If so, the grammar is ambiguous.
I'm trying to find out if its possible to have an example of a CFG for which it is impossible to give a Regular Expression which can accept the same language.
Any language which requires counting/remembering can't be expressed as a regular expression.
For example, a language which checks balanced parenthesis:
S -> { S } S
S -> ε
Since a regular machine/expression has only a limited (pre-defined) number of states, it cannot "remember" (infinitely) earlier parts of the input.
As such recognizing the following expression is impossible for a state-machine: anbn (n∈ℕ)
You could make such a machine for n ≤ x, where x∈ℕ, but no state-machine can do it for every possible value from ℕ.