Semantics of identifier line in Python - python-2.7

What is the semantics of a Python 2.7 line containing ONLY identifier. I.e. simply
a
or
something
?
If you know the exact place in the Reference, I'd be very pleased.
Tnx.

An identifier by itself is a valid expression. An expression by itself on a line is a valid statement.
The full semantic chain is a little more involved. In order to have nice operator precedence, we classify things like "a and b" as technically both an and_test and an or_test. As a result, a simple identifier technically qualifies as over a dozen grammar items
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [',']
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_comp] ')' |
'[' [testlist_comp] ']' |
'{' [dictorsetmaker] '}' |
NAME | NUMBER | STRING+ | '...' | 'None' | 'True' | 'False')
a stmt can be composed of a single simple_stmt, which can be composed of a simgle small_stmt, which can be composed of a single expr_stmt, and so on, down through testlist_star_expr, test, or_test, and_test, not_test, comparison, expr, xor_expr, and_expr, shift_expr, arith_expr, term, factor, power, atom, and finally NAME.

It's a simple expression statement: https://docs.python.org/2/reference/simple_stmts.html

Related

Antlr4 problems with negativ sign and operator

Hello we have this antlr4 Tree Parser:
grammar calc;
calculator: (d)*;
c
: c '*' c
| c '/' c
| c '+' c
| c '-' c
| '(' c ')'
| '-'?
| ID
;
d: ID '=' c;
NBR: [0-9]+;
ID: [a-zA-Z][a-zA-Z0-9]*;
WS: [ \t\r\n]+ -> skip;
The Problem is if I use a -, antlr4 doesn´t recognize, if is it ja sign or operator for sepcial inputs like: (-2-4)*4. For Inputs like this antlr4 doesn´t understand that the - befor the 2 belongs to the constant 2 and that the - is not a operator.
Just do something like this:
c
: '-' c
| c ('*' | '/') c
| c ('+' | '-') c
| '(' c ')'
| ID
| NBR
;
That way all these will be OK:
-1
- 2
-3-4
5+-6
-(7*8)
(-2-4)*4
For example, (-3-10)*10 is parsed like this:
EDIT
This is what happens when I parse 9+38*(19+489*243/1)*1+3:
| '-'?
should be:
| '-'? NBR
You need to specify that it's a NBR that may (or may not) be preceded by a -

Antlr4: Can't understand why breaking something out into a subrule doesn't work

I'm still new at Antlr4, and I have what is probably a really stupid problem.
Here's a fragment from my .g4 file:
assignStatement
: VariableName '=' expression ';'
;
expression
: (value | VariableName)
| bin_op='(' expression ')'
| expression UNARY_PRE_OR_POST
| (UNARY_PRE_OR_POST | '+' | '-' | '!' | '~' | type_cast) expression
| expression MUL_DIV_MOD expression
| expression ADD_SUB expression
;
VariableName
: ( [a-z] [A-Za-z0-9_]* )
;
// Pre or post increment/decrement
UNARY_PRE_OR_POST
: '++' | '--'
;
// multiply, divide, modulus
MUL_DIV_MOD
: '*' | '/' | '%'
;
// Add, subtract
ADD_SUB
: '+' | '-'
;
And my sample input:
myInt = 10 + 5;
myInt = 10 - 5;
myInt = 1 + 2 + 3;
myInt = 1 + (2 + 3);
myInt = 1 + 2 * 3;
myInt = ++yourInt;
yourInt = (10 - 5)--;
The first sample line myInt = 10 + 5; line produces this error:
line 22:11 mismatched input '+' expecting ';'
line 22:14 extraneous input ';' expecting {<EOF>, 'class', '{', 'interface', 'import', 'print', '[', '_', ClassName, VariableName, LITERAL, STRING, NUMBER, NUMERIC_LITERAL, SYMBOL}
I get similar issues with each of the lines.
If I make one change, a whole bunch of errors disappear:
| expression ADD_SUB expression
change it to this:
| expression ('+' | '-') expression
I've tried a bunch of things. I've tried using both lexer and parser rules (that is, calling it add_sub or ADD_SUB). I've tried a variety of combinations of parenthesis.
I tried:
ADD_SUB: [+-];
What's annoying is the pre- and post-increment lines produce no errors as long as I don't have errors due to +-*. Yet they rely on UNARY_PRE_OR_POST. Of course, maybe it's not really using that and it's using something else that just isn't clear to me.
For now, I'm just eliminating the subrule syntax and will embed everything in the main rule. But I'd like to understand what's going on.
So... what is the proper way to do this:
Do not use literal tokens inside parser rules (unless you know what you're doing).
For the grammar:
expression
: '+' expression
| ...
;
ADD_SUB
: '+' | '-'
;
ANTLR will create a lexer rules for the literal '+', making the grammar really look like this:
expression
: T__0 expression
| ...
;
T__0 : '+';
ADD_SUB
: '+' | '-'
;
causing the input + to never become a ADD_SUB token because T__0 will always match it first. That is simply how the lexer operates: try to match as much characters as possible for every lexer rule, and when 2 (or more) match the same amount of characters, let the one defined first "win".
Do something like this instead:
expression
: value
| '(' expression ')'
| expression UNARY_PRE_OR_POST
| (UNARY_PRE_OR_POST | ADD | SUB | EXCL | TILDE | type_cast) expression
| expression (MUL | DIV | MOD) expression
| expression (ADD | SUB) expression
;
value
: ...
| VariableName
;
VariableName
: [a-z] [A-Za-z0-9_]*
;
UNARY_PRE_OR_POST
: '++' | '--'
;
MUL : '*';
DIV : '/';
MOD : '%';
ADD : '+';
SUB : '-';
EXCL : '!';
TILDE : '~';

is antlr parser greedy?

I don't understand why this antlr4 grammar
grammar antmath1;
expr
: '(' expr ')' # parensExpr
| op=('+'|'-') expr # unaryExpr
| left=expr op=('*'|'/') right=expr # infixExpr
| left=expr op=('+'|'-') right=expr # infixExpr
| value=NUM # numberExpr
;
NUM : [0-9]+;
WS : [ \t\r\n] -> channel(HIDDEN);
works properly:
antlr tree produced by -(5+9)+1000; result=986
but why this one:
grammar antmath;
expr
: '(' expr ')' # parensExpr
| left=expr op=('*'|'/') right=expr # infixExpr
| left=expr op=('+'|'-') right=expr # infixExpr
| op=('+'|'-') expr # unaryExpr
| value=NUM # numberExpr
;
NUM : [0-9]+;
WS : [ \t\r\n] -> channel(HIDDEN);
fails:
antlr tree produced by the same expression; result=-1014
I expect the first grammar1 (which outputs correct result) to produce the same result as grammar2 (wrong output). The reasoning behind this: the only rule that admits '-' as first token is #unaryExpr so parser generated by any of the grammars would try to match that rule first. Then, provided the parser is greedy (for any of the two grammars), I would expect it to take the "(5+9)+1000" as a whole and match it with expr which it does because it is a valid expr.
where's the fault in my reasoning?
the grammars would try to match that rule first
It does. However, you've made unary minus have lower precedence than binary plus.
That means that the expression is being interpreted as -((5+9)+1000) instead of (-(5+9))+1000.

How to remove ambiguity in EBNF Instaparse grammar

How can i prevent that the "," literal in the structure rule is parsed as a operator in the following EBNF grammar for Instaparse?
Grammar:
structure = atom <"("> term ("," term)* <")">
term = atom | number | structure | variable | "(" term ")" | term operator term
operator = "," | ";" | "\\=" | "=="
Using the comma as a separator and as an operator like you do makes comma context sensitive which Ebnf on its own can't deal with.

How to disable non-standard features in SML/NJ

SML/NJ provides a series of non-standard features, such as higher-order modules, vector literal syntax, etc.
Is there a way to disable these non-standard features in SML/NJ, through some command-line param maybe, or, ideally, using a CM directive?
Just by looking at the grammar used by the parser, I'm going to say that there is not a way to do this. From "admin/base/compiler/Parse/parse/ml.grm":
apat' : OP ident (VarPat [varSymbol ident])
| ID DOT qid (VarPat (strSymbol ID :: qid varSymbol))
| int (IntPat int)
| WORD (WordPat WORD)
| STRING (StringPat STRING)
| CHAR (CharPat CHAR)
| WILD (WildPat)
| LBRACKET RBRACKET (ListPat nil)
| LBRACKET pat_list RBRACKET (ListPat pat_list)
| VECTORSTART RBRACKET (VectorPat nil)
| VECTORSTART pat_list RBRACKET (VectorPat pat_list)
| LBRACE RBRACE (unitPat)
| LBRACE plabels RBRACE (let val (d,f) = plabels
in RecordPat{def=d,flexibility=f}
end)
The VectorPat stuff is fully mixed in with the rest of the patterns. A recursive grep for VectorPat also will show that there aren't any options to turn this off anywhere else.