I am currently going over CFG and saw the answer and I am not sure how they got it. How did they get it to convert into Regular Expression from CFG here?
S -> aS|bX|a
X -> aX|bY|a
Y -> aY|a
answer:
R.E -> (a*(a+ba*a+ba*ba*a))
You should learn the basic rules that I have written in my answer "constructing an equivalent regular grammar from a regular expression", those rules will help you in converting "a regular expression into right or left liner grammar" or "a right or left liner grammar into regular expression" - both.
Though, more than one regular expressions (and grammars/automata) can be possible for a language. Below, I have tried to explain how to find regular expression given in answer for the question in your textbook. Read each step precisely and linked answer(s) so that you can learn approaches to solve such questions yourself next time.
At first step, to answering such question you should be clear "what does language this grammar generate?" (similarly, if you have an automata then try to understand language represented by that automata).
As I said in linked answer, grammar rules like: S → eS | e are corresponding to "plus clouser" and generates strings e+. Similarly, you have three pairs of such rules to generate a+ in your grammar.
S → aS | a
X → aX | a
Y → aY | a
(Note: a+ can also be written as a*a or aa* – describes one or more 'a'.)
Also notice in grammar, you do not have any "null production" e.g. A → ∧, so non-of the variable S, X or Y are nullable, that implies empty string is not a member of language of grammar, as: ε ∉ L(G).
If you notice start-variable's S productions rules:
S → aS | bX | a
Then it is very clear that strings ω in language can either start with symbol 'a' or with 'b' (as you have two choices to apply S productions either (1) S → aS | a that gives 'a' as the first symbol in ω, or (2) S → bX that use to produce strings those start with symbol 'b').
Now, what are the possible minimum length strings ω in L(G)? – minimum length string is "a" that is possible using production rule: S → a.
Next note that "b" ∉ L(G) because if you apples S → bX then later on you have to replace X in sentential form bX using some of X's production rules, and as we know X is also not nullable hence there would be always some symbol(s) after 'b' – in other words sentimental from bX derives ∣ω∣ ≥ 2.
Form above discussion, it is very clear that using S production rules you can generate sentential forms either a*a or a*bX, in two steps:
For a* use S → aS repeatedly that will give S ⇝ a*S (symbol ⇝ means more than one steps)
Replace S in rhs of S ⇝ a*S to get either by a*a or a*bX
Also, "a*a or a*bX" can be written as S ⇝ a*(a + bX) or S ⇝ (a*(a + bX)) if you like to parenthesizes complete expression✎.
Now compare production rules of S and X both are the same! So as I shown above for S, you can also describe for X that it can use to generate sentential forms X ⇝ (a*(a + bY)).
To derive the regular expressions given in answer replace X by (a*(a + bY)) in S ⇝ a*(a + bX), you will get:
S ⇝ a*(a + b X )
S ⇝ a*(a + b (a*(a + bY)) )
And now, last Y production rules are comparatively very simple - just use to create "plus clouser" a+ (or a*a).
So let's replace Y also in S derived sentential form.
S ⇝ a*(a + b(a*(a + bY)))
⇝ a*(a + b(a*(a + ba*a)))
Simplify it, apply distribution low twice to remove inner parenthesis and concatenate regular expressions – P(Q + R) can be written as PQ + PR.✞
⇝ a*(a + b(a*(a + ba*a)))
⇝ a*(a + b(a*a + a*ba*a))
⇝ a*(a + ba*a + ba*ba*a)
✎ : + in regular expression in formal languages use in two syntax (i) + as binary operator means – "union operation" (ii) + as unary superscript operator means – "plus clouser"
✎ : In regex in programming languages + is only uses for "plus clouser"
✞ : In regex we use ∣ symbol for union, but that is not exactly a union operator. In union (A ∪ B) is same as (B ∪ A) but in regex (A ∣ B) may not equals to (B ∣ A)
What you can observe from the question is that the grammar apart from being a CFG is also right linear. So you can construct an finite automata for this right linear grammar. Now that you have the finite automata constructed their exists a regular expression with the same language and the conversion can be done using the steps given in this site.
Related
Okay, so in programming the logical OR symbol (typically ||) when applied to operands a and b, that is, a || b, means that either a or b can be true, OR both can be true. If you want only one to be true, you use XOR (sometimes, the ^ symbol.)
However, in formal language theory, the concept of OR (typically the + symbol) seems to imply exclusive-or (xor) instead of regular OR. For example, if we describe a language L with a regular expression aa + bb + ab, a valid string (word) from the language would be one of those (aa, bb, or ab), not some concatenation of them. To do that, you must use the Kleene closure, as in (aa + bb + ab)*, right?
Perhaps I'm just thinking of + as being defined in a peculiar way, or perhaps it's that the operands are no longer Boolean?
I'm just looking for verification if I seem to be understanding that + (OR) has a seemingly different meaning in formal language / computational modeling than it does in programming languages. Thanks!
The formal language OR is an inclusive ("regular") OR. E.g., the regular language ab* + a*b includes strings that are in both ab* and a*b (i.e., the string ab).
The problem is not with the operator - the + in regular expressions really does mean the same thing as union of sets - the problem is with your understanding of the operands. Specifically, in your regular expression, aa + bb + ab, aa does not represent a string over your alphabet, but a sub-regular expression. Regular expressions describe sets of strings; so the regular expression aa describes the set of strings {aa}. So, the regular expression aa + bb + ab describes the set of strings {aa} union {bb} union {ab} = {aa, bb, ab}. The exclusive-or of set theory, symmetric difference, does not have an operator in the regular expression syntax. We can recursively define the language of a regular expression, written L(r) for regular expression r, as follows:
L(r) = {r}, if r is a string over the alphabet;
L(r) = L(s)L(t) if r = st;
L(r) = L(s)* if r = s*;
L(r) = L(s) union L(t) if r = s + t.
I'm translating some Fortran into Javascript and the order of operations for exponents is pretty opaque to me for a particular class of equations.
Here's an example of the Fortran equation:
x = 1+a*b**c*c**d
Exponents in Fortran are designated with the ** operator. This page gives some hints:
Arithmetic expressions are evaluated in accordance with the following priority rules:
All exponentiations are performed first; consecutive exponentiations are performed from right to left.
All multiplication and divisions are performed next, in the order in which they appear from left to right.
So it feels to me like to translate this into, say, Python, you'd end up with something like:
x = 1+a*pow(b,c)*pow(c,d)
But that isn't getting me the answers I'd expect it to, so I wanted to check if that seemed sane or not (because order of operations was never a strong suit of mine even in the best of circumstances, certainly not with languages I am not very familiar with).
Here's another puzzler:
x = a*(1-(1-a)**b)+(1-a)*(a)**(1/b)
Oy! This hurts my head. (Does putting that lone (a) in parens matter?) At least this one has parens, which suggests:
x = a*(1-pow(1-a,b))+(1-a)*pow(a,1/b)
But I'm still not sure I understand this.
You can look at this in terms of precedence of operators. ** is a higher precedence than *, higher than binary +. The expression 1+a*b**c*c**d is then like 1+a*(b**c)*(c**d).
More formally...
Expressions in Fortran are described by a grammar consisting of primaries and level-1 to level-5 expressions. Let's look at these using the example of the question
x = 1+a*b**c*c**d
the right-hand side of this is the expression of interest. As this is a so-called level-2 expression I'll explain only in terms up to that.
The primaries of this expression are the constant 1 and the variables a, b, c and d. We also have a number of operators +, ** and *.
The primaries are all level-1 expressions, and as there is no defined unary operator here we'll go straight to the level-2 expressions.
A level-2 expression is defined by the rules (quoting F2008 7.1.2.4):
mult-operand is level-1-expr [ power-op mult-operand ]
add-operand is [ add-operand mult-op ] mult-operand
level-2-expr is [ [ level-2-expr ] add-op ] add-operand
where power-op is **, mult-op * (or /) and add-op + (or -).
Putting the parsing together we have:
1 + add-operand
1 + ( add-operand * mult-operand )
1 + ( ( a * mult-operand ) * ( mult-operand ) )
1 + ( ( a * ( b ** c ) ) * ( c ** d ) )
As a final note, an expression enclosed in parentheses is a primary. This ensures that the expectation of precedence with parentheses is preserved.
This grammar also explains why a**b**c is evaluated as a**(b**c).
For illustrate (no HTML in comments).
Fortran: 3**j**a**1 === Math: 3ja1
1+a*b**c*c**d === 1 + a×bc×cd
Does spaces have any meaning in these expressions:
assume:
int a = 1;
int b = 2;
1)
int c = a++ +b;
Or,
2)
int c = a+ ++b;
When I run these two in visual studio, I get different results. Is that the correct behavior, and what does the spec says?
In general, what should be evaluated first, post-increment or pre-increment?
Edit: I should say that
c =a+++b;
Does not compile on visual studio. But I think it should. The postfix++ seems to be evaluated first.
Is that the correct behavior
Yes, it is.
Postfix ++ first returns the current value, then increments it. so int c = a++ +b means compute the value of c as the sum between current a(take the current a value, and only after taking it, increment a) and b;
Prefix ++ first increments the current value, then returns the value already incremented, so in this case, int c = a+ ++b means compute c as the sum between a and the return of the next expression, ++b, which means b is first incremented, then returned.
In general, what should be evaluated first, post-increment or
pre-increment?
In this example, it is not about which gets evaluated first, it is about what each does - postfix first returns the value, then increments it; prefix first increments the value, then returns it.
Hth
Maybe it helps to understand the general architecture of how programs are parsed.
In a nutshell, there are two stages to parsing a program (C++ or others): lexer and parser.
The lexer takes the text input and maps it to a sequence of symbols. This is when spaces are handled because they tell where the symbols are. Spaces really matter at some places (like between int and c, to not confuse with the symbol intc) but not others (like between a and ++ because there is no ambiguity to separate them).
The first example:
int c = a++ +b;
gives the following symbols, each on its own row (implementations may do this in slightly different ways of course):
int
c
=
a
++
+
b
;
While in the other case:
int c = a+ ++b;
the symbols are instead:
int
c
=
a
+
++
b
;
The parser then builds a tree (Abstract Syntax Tree, AST) out of the symbols and according to some grammar. In particular, according to the C++ grammar, + as an addition has a lower precedence than the unary ++ operator (regardless of postfix or prefix). This means that the first example is semantically the same as (a++) + b while the second is like a+ (++b).
For your examples, the ASTs will be different, because the spaces already lead to a different output at the lexer phase.
Note that spaces are not required between ++ and +, so a+++b would theoretically be fine, but this is not recommended for readability. So, some spaces are important for technical reasons while others are important for us users to read the code.
Yes they should be different; the behaviour is correct.
There are a few possible sources for your confusion.
This question is not about "spaces in operators". You have different operators. If you were to remove the space, you would have a different question. See What is i+++ increment in c++
It's also not about "what should be evaluated first, post-increment or pre-increment". It's about understanding the difference between post-increment and pre-increment.
Both increment the variable to which they apply.
But the post-increment expression returns the value from before the increment.
Whereas the pre-increment expression returns the value after the increment.
I.e.
//Given:
int a = 1;
int b = 2;
//Post-increment
int c = a++ +b; =>
1 + 2; (and a == 2) =>
3;
//Pre-increment
int c = a+ ++b; =>
1 + 3; (and b == 3) =>
4;
Another thing that might be causing confusion. You wrote: a++ +b;. And you may be assuming that +b is the unary + operator. This would be an incorrect assumption because you have both left and right operands making that + a binary additive operator (as in x + y).
Final possible confusion. You may be wondering why:
in a++ +b the ++ is a post-increment operator applied to a.
whereas in a+ ++b it's a pre-increment operator applied to b.
The reason is that ++ has higher precedence than the binary additive +. And in both cases it would be impossible to apply ++ to +.
If I define a function in OCaml, for example let f x = x + 1;; and then I try to call it passing a negative number
f -1;; it gives to me the following error
Error: This expression has type int -> int
but an expression was expected of type int
Why this error occurs?
Basically, it comes from the precedence of the parser. The compiler believes that f -1 means you want to subtract f by 1. It has been complained about for ages now.
Typing in f (-1) or f ~-1 will solve your problem (the later using the "explicitly unary minus").
UPDATE:
As stated in the OCaml manual:
Unary negation. You can also write - e instead of ~- e.
Basically, - can be used both as a binary operator 4 - 1 and a unary operator -1. But, as in your case, there can be confusion: f - 1 is "f minus one" and not "f applied to minus one". So the ~- operator was added to have a non-confusing unary minus as well.
Note that the spaces are not significant here, and that won't change because a lot of already existing code may contain operations without space.
I am as much newbie to Pi-Calculus as I am with Backus Naur Form.
Here is one of the core BNF for Pi Calculus ( found in "Applied Pi - A Brief Tutorial" by Peter Sewell)
P,Q ::= 0 nil
P | Q parallel composition of P and Q
~cv output v on channel c
cw.P input from channel c
new c in P new channel name creation
In deed I am focussed on learning Pi Calculus. But I do wonder about the meaning of P,Q ::= in the definition of the BNF.
I would understand P ::= meaning that a process P of Pi calculus is this or this or this.
But what P,Q ::= stands for ?
Here, this means that the letters P and Q are both used to denote processes. For example, in P | Q, P is a process and Q is a process. The author could have written
P ::= 0
P1 | P2
~cv
cw.P
new c in P
but preferred to allow two distinct letters to refer to the same concept in order to make formulas a bit more readable.
By the way, classically the alternatives in BNF are separated by a vertical bar; but since the vertical bar | has a meaning in pi-calculus, the author didn't want to use them both in their pi-calculus meaning and in their BNF meaning. The definition should still be read as “a process is either nil, or a parallel composition, or …”.