Why literals are considered expressions in C++? - c++

I'm studying the C++ programming language using Programming priciples and practice using C++.
I'm in chapter 4 now and in this chapter the book introduces the concept of expression, but I can't understand it at all :
The most basic building block in a program is an expression. An espression compute a value from a number of operands. The simplest expression in C++ is simply a literal value such as 11, 'c', "hello". Names of variables are also expressions. A variable represent the object which it is the name.
Why a literal is considered an expression ? Why the name of a variable is considered an expression ?

Expressions -in programming languages, in math, in linguistics- are defined compositionally (or inductively). So expressions are often made of subexpressions like x*2+y*4 is made of two sub-expressions x*2 and y*4 joined by the addition operator +.
But you need a base case (the most atomic and simple expressions). These are literals (2) and variables (x) - if either of them was not an expression 2*x could not be an expression (since both operands of the binary multiplication * are sub-expressions).
Notice that in C and C++ assignments and function calls are expressions

Think of it like this: An expression is a sequence of steps that produce a value. Thus, 4+3 is a two-step expression, because you (1) start with the number 4, and (2) add 3 to it.
Therefore, 7 can be regarded as a single-step sequence, because there is only one "action" performed: (1) start with the number 7.
Thus, both a = 4+3; and a = 7; can be generalised to a = <expression>;.

An expression is "a sequence of operators and operands that specifies a computation" (http://en.cppreference.com/w/cpp/language/expressions).
Let see a simple expression: 3 + 3. When you evaluate this expression, you will get the result 6.
So let see another expression: 3. When you evaluate this expression, you will get the result 3.
A literal is considered an expression because a literal is a type of constant and constants are expressions with a fixed value.
A variable is also considered as an expression because it can be used as an operand within another expression or as an expression by itself.
In software design, composite pattern can be used as a representation of the expression.

Related

In C++, what is the difference between an expression and a literal?

My book defines an expression as "a programming statement that has a value" and a literal as "a piece of data that is written directly into a program's source code", but I'm still having some trouble distinguishing between the two. For example, is 3+3 a literal AND an expression, or just an expression? Why?
int number = 2+2;
Is this whole statement an expression, or just the right value? Why? This whole statement has a value of 4, so surely the whole statement is an expression?
In my mind, an expression usually involves operators and a literal involves a single piece of data like 4, "Hello", 'A', etc. I also understand that a literal can be an expression because of unary operators such as - or +. Am I correct in thinking this?
An expression is a sequence of operators and operands that specifies a computation. An expression can result in a value and can cause side effects.
A literal is one of the following:
integer literal
character literal
floating point literal
string literal
boolean literal
pointer literal
I won't try to give the formal definition of each of these, but each is basically just a value.
There's one more type of literal that's somewhat special though:
user-defined literal
Although user-defined literals are literals, the value of the literal is defined in terms of the result of evaluating an expression.
References:
Expressions: [expr]
Literals: [lex.literal]
(For those unfamiliar with it, the tag in square brackets is the notation used to specify sections in the C++ standard).
A literal is something like the number 7 for example. When converted to assembly code, the literal 7 remains quite visible in the code:
MOV R1, 7 ; move the number 7 as a value into register R1
An expression is something that needs to be evaluated. Generally, you'll find something along the lines of C=A+B; where A+B is an expression.
An expression is a sequence of operators and their operands, that
specifies a computation. Expression evaluation may produce a result
(e.g., evaluation of 2+2 produces the result 4) and may generate
side-effects (e.g. evaluation of std::printf("%d",4) prints the
character '4' on the standard output).
http://en.cppreference.com/w/cpp/language/expressions
http://en.cppreference.com/w/cpp/language/expressions#Literals

Do parentheses force order of evaluation and make an undefined expression defined?

I was just going though my text book when I came across this question:
What would be the value of a after the following expression? Assume the initial value of a = 5. Mention the steps.
a+=(a++)+(++a)
At first I thought this is undefined behaviour because a has been modified more than once. Then I read the question and it said "Mention the steps" so I probably thought this question is right.
Does applying parentheses make an undefined behaviour defined?
Is a sequence point created after evaluating a parentheses expression?
If it is defined,how do the parentheses matter since ++ and () have the same precedence?
No, applying parentheses doesn't make it a defined behaviour. It's still undefined. The C99 standard §6.5 ¶2 says
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored.
Putting a sub-expression in parentheses may force the order of evaluation of sub-expressions but it does not create a sequence point. Therefore, it does not guarantee when the side effects of the sub-expressions, if they produce any, will take place. Quoting the C99 standard again §5.1.2.3¶2
Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and no side
effects of subsequent evaluations shall have taken place.
For the sake of completeness, following are sequence points laid down by the C99 standard in Annex C.
The call to a function, after the arguments have been evaluated.
The end of the first operand of the following operators: logical AND &&; logical OR ||; conditional ?; comma ,.
The end of a full declarator.
The end of a full expression; the expression in an expression statement; the controlling expression of a selection statement (if
or switch); the controlling expression of a while or do
statement; each of the expressions of a for statement; the
expression in a return statement.
Immediately before a library function returns.
After the actions associated with each formatted input/output function conversion specifier.
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any
movement of the objects passed as arguments to that call.
Adding parenthesis does not create a sequence point and in the more modern standards it does not create a sequenced before relationship with respect to side effects which is the problem with the expression that you have unless noted the rest of this will be with respect to C++11. Parenthesis are a primary expression covered in section 5.1 Primary expressions, which has the following grammar (emphasis mine going forward):
primary-expression:
literal
this
( expression )
[...]
and in paragraph 6 it says:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. The presence of parentheses does not affect whether the expression is an lvalue. The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
The postfix ++ is problematic since we can not determine when the side effect of updating a will happen pre C++11 and in C this applies to both the postfix ++ and prefix ++ operations. With respect to how undefined behavior changed for prefix ++ in C++11 see Assignment operator sequencing in C11 expressions.
The += operation is problematic since:
[...]E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is
evaluated only once[...]
So in C++11 the following went from undefined to defined:
a = ++a + 1 ;
but this remains undefined:
a = a++ + 1 ;
and both of the above are undefined pre C++11 and in both C99 and C11.
From the draft C++11 standard section 1.9 Program execution paragraph 15 says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Is there a regular language to represent regular expressions?

Specifically, I noticed that the language of regular expressions itself isn't regular. So, I can't use a regular expression to parse a given regular expression. I need to use a parser since the language of the regular expression itself is context free.
Is there any way regular expressions can be represented in a way that the resulting string can be parsed using a regular expression?
Note: My question isn't about whether there is a regexp to match the current syntax of regexes, but whether there exists a "representation" for regular expressions as we know it today (maybe not a neat as what we know them as today) that can be parsed using regular expressions. Also, please could someone remove the dup since it isn't a dup. I'm asking something completely different. I already know that the current language of regular expressions isn't regular (it is how I started my original question).
Depending on what you mean by "represent", the answer is "yes" or "no":
If you want a language that (homomorphically) maps 1:1 to the usual basic regular expression language, the answer is no, because a regular language cannot be isomorphic to a non-regular language, and the standard regular expression language is non-regular. This is because the syntax requires matching opening and closing parentheses of arbitrary depth.
If "represent" only means another method of specifying regular languages, the answer is yes, and right now I can think of at least three ways to achieve this:
The "dumbest" and easiest way is to define some surjective mapping f : ℕ -> RegEx from the natural numbers onto the set of all valid standard regular expressions. You can define the natural numbers using the regular expression 0|1[01]*, and the regular language denoted by a (string representing the) natural number n is the regular language denoted by f(n).
Of course, the meaning attached to a natural number would not be obvious to a human reader at all, so this "regular expression language" would be utterly useless.
As parentheses are the only non-regular part in simple regular expressions, the easiest human-interpretable method would be to extend the standard simple regular expression syntax to allow dangling parentheses and defining semantics for dangling parentheses.
The obvious choice would be to ignore non-matching opening parentheses and interpreting non-matching closing parentheses as matching the beginning of the regex. This essentially amounts to implicitly inserting as many opening parentheses at the beginning and as many closing parentheses at the end of the regex as necessary. Additionally, (* would have to be interpreted as repetition of the empty string. If I didn't miss anything, this definition should turn any string into a "regular expression" with a specified meaning, so .* defines this "regular expression language".
This variant even has the same abstract syntax as standard regular expressions.
Another variant would be to specify the NFA that recognizes the language directly using a regular language, e.g.: ([a-z]+,([^,]|\\,|\\\\)+,[a-z]+\$?;)*.
The idea is that [a-z]+ is used as a label for states, and the expression is a list of transition triples (s, c, t) from source state s to target state t consuming character c, and a $ indicating accepting transitions (cf. note below). In c, backslashes are used to escape commas or backslashes - I assumed that you use the same alphabet for standard regular expressions, but of course you can replace the middle component with any other regular language of symbols denotating characters of any alphabet you wish.
The first source state mentioned is the (single) initial state. An empty expression defines the empty language.
Above, I wrote "accepting transition", not "accepting state" because that would make the regex above a bit more complex. You can interpret a triple containing a $ as two transitions, namely one transition consuming c from s to a new, unique state, and an ε-transition from that state to t. This should allow any NFA to be represented, by replacing each transition to an accepting state with a $ triple and each transition to a non-accepting state with a non-$ triple.
One note that might make the "yes" part look more intuitive: Assembly languages are regular, and those are even Turing-complete, so it would be unexpected if it wasn't possible to specify "mere" regular languages using a regular language.
The answer is probably NO.
As you have pointed out, set of all possible regular expressions itself is not a regular set. Any TRUE regular expression (not those extended) can be converted into finite automata (FA). If regular expression can be represented in a form that can be parsed by itself, then FA can be parsed by regular expression as well.
But that's not possible as far as I know. RE itself can be reduced into three basic operation(According to the Dragon Book):
concatenation: e.g. ab
alternation: e.g. a|b
kleen closure: e.g. a*
The kleen closure can match infinite number of characters, but it cannot know how many characters to match.
Just think such case: you want to match 3 consecutive as. Then the corresponding regular expression is /aaa/. But what if you want match 4, 5, 6... as? Parser with only one RE cannot know the exact number of as. So it fails to give the right matching to arbitrary expressions. However, the RE parser has to match infinite different forms of REs. According to your expression, a regular expression cannot match all the possibilities.
Well, the only difference of a RE parser is that it does not need a tokenizer.(probably that's why RE is used in lexical analysis) Every character in RE is a token (excluding those escape charcters). But to parse RE, whatever it is converted,one has to face up with NFA/DFA/TREE... all equivalent structures that cannot be parsed by RE itself.

Is there a way to negate a regular expression?

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says:
The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L
For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)?
As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?
Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP.
If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would.
Use your efficient algorithm to generate a regular expression for the negation;
Determine whether the resulting regular expression generates the empty set.
Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward:
The regular expression {} generates the empty set.
(r + s) generates the empty set iff both r and s generate the empty set.
(rs) generates the empty set iff either r or s generates the empty set.
Nothing else generates the empty set.
Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression.
(Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).
Wikipedia says: ... if there exists at least one regex that matches a particular set then there exist an infinite number of such expressions. We can deduct from this statement that there is an infinite number of expressions that describe the language of all words except those described by R.
Again, (as also #nhahtdh tried to explain) the simplest algorithm to address this question is to extend the scope of evaluation outside the context of the regular expression language itself. That is: match the strings you want to exclude (which represent a finite subset to work with) by using the original regular expression and then treat any failure to match as an actual match (out of an infinite set of other possibilities). So, if the result of the match is negative, your candidate strings are a subset of the valid solutions.

C++ infix to prefix conversion for logical conditions

I want to evaluate one expression in C++. To evaluate it, I want the expression to be converted to prefix format.
Here is an example
wstring expression = "Feature1 And Feature2";
Here are possible ways.
expression = "Feature1 And (Feature2 Or Feature3)";
expression = "Not Feature1 Or Feature3";
Here And, Or, Not are reserved words and parentheses ("(", )) are used for scope
Not has higher precedence
And is set next precedence to Not
Or is set to next precedence to And
WHITE SPACE used for delimiter. Expression has no other elements like TAB, NEWLINE
I don't need arithmetic expressions. I can do the evaluation but can somebody help me to convert the strings to prefix notation?
You will need to construct the grammar up front. So why do all the parsing by hand.
Instead use a parser builder library like Boost-Spirit. Or lex/yacc or flex/bison.
Then use the AST generated by the parser builder to output the data in any way you see fit. Such as infix to prefix or postfix, ...etc.
I guess your intention is to evaluate condition. hence you dont need a full fledged parser.
First of all you dont need to work with strings here.
1. Convert "Feature 1" to say an Id (An integer which represents a feature)
So, the statement "Feature1 And (Feature2 Or Feature3)"; to say (1 & (2 | 3)
From here on...you can use the standard Infix to prefix conversion and evaluate th prefix notation.
Here is the algorithm to convert infix to prefix
http://www.c4swimmers.esmartguy.com/in2pre.htm
http://www.programmersheaven.com/2/Art_Expressions_p1
Use a parser generator like the Lex/Yacc pair.