What is the formal grammar equivalent to this regex? - regex

The regular expression is (a+)+ . Using an NFA this would give reDOS attacks for longer strings . What would be the equivalent grammar for this regular expression ?
Now i was trying to determine the grammar in multiple steps .
a+ would translate to
S -> a
S -> aS
(a+)+ would translate to
G -> S
G -> SG
I was not sure how to simplify further whether it would be CFG or CSG ? Any suggestions would be of great help

(a+)+ is equivalent to a+. A possible grammar is
S → A
A → a
A → aA
The grammar is regular (such one must exist, because it's derived from a regular expression). It's therefore also context-free and therefore also context-sensitive.

Related

Regular expression for context free grammar

I have this context free grammar :
S -> aSb
S -> aSa
S -> bSa
S -> bSb
S -> epsilon
I want to show that this grammar describes a regular language ( namely can be represented as a regular expression) but I'm not sure how to do that and get the confident I'm not missing any pattern.
I did not see this exact question and that why I don't think it is duplicate. I'd like an explanation on this relative simple example. It was hard for me to follow more complicated examples.
You must build a DFA or regular expression. The DFA will have 2 states in this case, I think. q1(even) move to q2(odd) after a, b and from q2 move to q1 after a, b. The start and accepting state is q1.

How can I find regular grammar for L* if we are given a grammar for language L?

Is there any general method to do so? For example, we have a general method to find grammar for L1 U L2 by adding a production S-> S1 | S2 where S1 and S2 are start symbols for grammars of L1 and L2 respectively.
Thanks in advance..
In general, given a grammar G such that L(G) = L', there is no algorithm which always produces a regular grammar G' such that L(G') = (L')*. For starters, (L')* may not be a regular language. Even if you allow the procedure to recognize this case and print "not a regular language" in such a case, this cannot be generally possible since it would allow us to determine whether arbitrary unrestricted grammars generate particular strings (the construction is not too hard but I won't provide it unless desired). This is an undecidable problem, so we can't recognize regular languages in unrestricted grammars.
Perhaps your question is whether there is a neat construction to do this if initially given a regular grammar. In that case, the answer is a definite and clear, "yes!" Here is one easily described (though possibly inefficient in practice) procedure for doing just that:
Convert the regular grammar into a nondeterministic finite automaton using the typical construction for doing so. There are easy constructions for left-regular and right-regular grammars.
Construct a regular expression from the nondeterministic finite automaton using any known construction. One such construction is typically used in proving equivalence.
Construct a new regular expression which is the Kleene closure of the one from the last step.
Construct a nondeterministic finite automaton from the regular expression from the last step, using a standard construction.
Construct a regular grammar from the nondeterministic finite automaton from the last step. There are known constructions for this.
Thus, we can mechanically go from regular grammar for L to regular grammar for L*.
If you just want ANY grammar for L*, the simplest would probably be to introduce a new start state S' and productions S' := S'S' | S where S is the start symbol of your input grammar. This obviously does not give a regular grammar, however - if the input grammar generates a regular language, this one will do so as well.
Example: given the regular grammar
S := 0S | 1T
T := 0S | 1T | 1
A construction gives us this nondeterministic finite automaton:
q s q'
- - -
S 0 S
S 1 T
T 0 S
T 1 T
T 1 (H)
A construction gives us the regular expression:
(0*1)(0*1)*1
The Kleene closure of this is:
((0*1)(0*1)*1)*
We recognize from the standard construction that this automaton is equivalent:
q s q'
- - -
(I) - S
S 0 S
S 1 T
T 0 S
T 1 T
T 1 H
H - (I)
Whence the following regular grammar:
I := S | -
S := 0S | 1T
T := 0S | 1T | H
H := I

Regular expression for a grammar

I'm reading finite automata & grammars from the compiler construction of Aho and I'm stuck with this grammar for so long. I don't have a clear perception of how I can describe it:
Consider the following Grammar:
S -> (L) | a L -> L, S | S
Note that the parentheses and comma are actually terminals in this
language and appear in the sentences accepted by this grammar. Try to
describe the language generated by this grammar. Is this grammar
ambiguous?
My concern here is: Can language generated by this grammar be described as regular expressions? I'm confused about how to do it. Any help?
To show that the grammar is ambiguous, you need to be able to construct two different parse trees while parsing the same string. Your string will be comprised of "(", ")", ",", and "a", since those are the only terminal symbols in the grammar.
Try arranging those 4 terminal symbols in a few ways and see if you can show different successful parses, in the spirit of the example ambiguous grammar on Wikipedia.
Immediate left recursion tends to cause problems for some parsers. See if "a,a,a" does anything interesting on "L → L , S | S"...
my concern here is language generated by this grammar as regular expression can it be described...i'am confused about how to do
A regular expression can not fully describe the grammar. Rewriting part of the grammar will make this more apparent:
S → ( L )
S → a
L → L , S
L → S
Pay attention to #1 and #4. L can produce S, and S can produce ( L ). This means S can produce ( S ), which can produce ( ( S ) ), ( ( ( S ) ) ), etc. ad infinitum. The key thing is that those parentheses are matched; there are the same amount of "(" symbols as ")" symbols.
A regex can't do that.
Regular expressions map to finite automata. Finite automata can not count. A language L ∈ {w: 0n 1n} is not a regular. L ∈ {w: (n )n}, just being a substiution of "(" for "0" and ")" for "1", isn't either. See: the first examples section under Regular Languages - Wikipedia. (Notation note: s1 is s, s2 is ss, ..., sn is s repeated n times.)
This means you can't use a regex to describe that part of the language. That puts it in the domain of CFGs, Turing Machines, and pushdown automata.
Regular expressions (and a library to interpret them) are a poor tool for recognizing sentences of a context-free grammar. Instead, you would want to use a parser generator like yacc, bison, or ANTLR.
I think the point of the exercise in Aho's book is to "describe the language" in words, in order to understand whether it is ambiguous. One way to approach it: can you devise a grammatical sentence that can be parsed in two different ways, given the productions of the grammar? If so, the grammar is ambiguous.

Context Free Grammar for which a RegEx is impossible

I'm trying to find out if its possible to have an example of a CFG for which it is impossible to give a Regular Expression which can accept the same language.
Any language which requires counting/remembering can't be expressed as a regular expression.
For example, a language which checks balanced parenthesis:
S -> { S } S
S -> ε
Since a regular machine/expression has only a limited (pre-defined) number of states, it cannot "remember" (infinitely) earlier parts of the input.
As such recognizing the following expression is impossible for a state-machine: anbn (n∈ℕ)
You could make such a machine for n ≤ x, where x∈ℕ, but no state-machine can do it for every possible value from ℕ.

Is this context-free grammar a regular expression?

I have a grammar defined as follows:
A -> aA*b | empty_string
Is A a regular expression? I'm confused as to how to interpret a BNF grammar.
No, this question doesn't actually have to do with regular expressions. Context-free grammars specify languages that can't be described by regular expressions.
Here, A is a non-terminal; that is, it's a symbol that must be expanded by a production rule. Given that you only have one rule, it is also your start symbol - any production in this grammar must start with an A.
The production rule is
(1) A -> aA*b |
(2) empty_string
a and b are terminal symbols - they are in the alphabet of the language, and cannot be expanded. When you have no more nonterminals on the left-hand side, you are done.
So this language specifies words that are like balanced parentheses, except with a and b instead of ( and ).
For instance, you could produce ab as follows:
A -> aA*b (using 1)
aAb -> ab (using 2)
Similarly, you could produce aabb:
A -> aA*b (1)
aAb -> aaA*bb (1)
aaAbb -> aabb (2)
Or even aababb:
A -> aA*b
aA*b -> aabA*b:
aaba*b -> aababA*b:
aababA*b: -> aababb
Get it? The star symbol may be a bit confusing because you have seen it in regular expressions, but actually it does the same thing here as there. It is called a Kleene-closure and it represents all words you can make with 0 or more As.
Regular Expressions generate Regular languages and can be parsed with State Machines.
BNF grammars are Context Free Grammars which generate Context Free languages and can be be parsed with Push Down Automata (stack machines)
Context Free Grammars can do everything Regular Grammars can and more.
A appears to be a BNF grammar rule. I'm not really sure why you have this confused with a regular expression. Are you confused because it has a * in it? Everything that has a * isn't a regular expression.